How to detect if page has completely loaded in Selenium? - python

What I am to do is trying to iterate over all select values and grab some data. Data is loaded via AJAX so the page isn't reloading - the data in the table is just changing. As you can see I have tried to use explicit wait but that doesn't work - select value can be "4" but information in the table can still belong to the "3" value. I would like to know how can I be sure that the page is really updated?
There're some similar questions but I didn't find proper solution.
for i in range(N):
try:
driver.find_element_by_xpath("//select[#id='search-type']/option[#value='%s']"%i).click()
driver.find_element_by_name("submit").click()
except:
print i
continue
elem = driver.find_element_by_id("my_id")
wait.until(lambda driver: driver.find_element_by_id('search-type').get_attribute("value")==str(i))
if no_results(get_bs_source(driver.page_source)):
print "No results %s"%i
if is_limited(get_bs_source(driver.page_source)):
print "LIMITED %s"%i
data+=get_links_from_current_page(driver)
For sure I can use time.sleep(N) but that's not the way I would like to do this.

Selenium is using document.ready state to determine if page has loaded or not.
WebDriverWait implementing such kind of logic. You should not use document.readyState directly!
In most of functions it's waiting, for example:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("http://something.com")
delay = 10 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
In this case it will execute document.ready 2 times: 1st time browser.get, second time while WebDriverWait.
Selenium implementation: https://github.com/SeleniumHQ/selenium/blob/01399fffecd5a20af6f31aed6cb0043c9c5cde65/java/client/src/com/thoughtworks/selenium/webdriven/commands/WaitForPageToLoad.java#L59-L73

var javaScriptExecutor = Instance as IJavaScriptExecutor;
bool isPageLoaded = javaScriptExecutor.ExecuteScript("return document.readyState;").Equals("complete");

Related

Can't select element from page [Selenium[

I have tried to scrap info from that site - specifically, from a table. Every time I occur, info that elements doesn't exist.
https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
I try to add time.slep(5) to my code or scrolling down function to load all element - ineffective.
Do you have any advice for me?
EDIT
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
I added code in regard to your request. So my main goal is to scrap content from the table at this site. I add Explicit Waits to my code and still I can't select anything from that table - it's looking like the script doesn't see anything from that area.
One way to try solve it, its using the Xpath of the element or the relative position of the same, to make Selenium, get allways the same "line" of position to return the "value" of the information that you are searching.
Ex1:find_element(:xpath,"//*[#id="wmd-input"]")#in that case it's the input of this check box.
If it doesnt work, try this one.
Ex2: browser.implicitly_wait(30) #makes a timer to load all the informations from the web to your machine.

Wait for class to load value after clicking button selenium python

After the website is loaded I click a button successfully which will then generate some numbers in this class
<div class="styles__Value-sc-1bfbyy7-2 eVmhyz"></div>
but not instantly, it will put them in one by one. Selenium will instantly grab the first value that gets put into the class but doesn't wait for the other values to get added. Any way to wait for it to load all the values in there before grabbing it.
Here is the python code I use for grabbing the value:
total = driver.find_element_by_xpath("//div[#class='styles__Value-sc-1bfbyy7-2 eVmhyz']").text
Selenium has a WebDriverWait method:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
browser = webdriver.Chrome()
delay = 5
total = WebDriverWait(browser, delay).until(expected_conditions.presence_of_element_located(<locator>)
I haven't tested it locally but it may work. There is also presence_of_all_elements_located method, you can find the details on this page.
Hope this helps!

Selecting autocomplete lists with Selenium - have to close one list before accessing another

I'm trying to write a Python script with Selenium to autocomplete a form.
There are several fields with auto-complete. I can fill these out and select as follows:
field1 = driver.find_element_by_id("field-1")
field1.send_keys("input")
driver.find_element_by_xpath("//ul[1]/li[1]") #this clicks on the first autocomplete option
This works fine.
However, in order to proceed to the next form, I have to first simulate a click away, and usually delay by 5-8 seconds:
driver.find_element_by_xpath("//body").click()
time.sleep(delay)
Delay is usually set at 8 seconds - any less and it seems to not work from time to time.
Is there a more efficient way to do this that avoids using a timed delay?
I thought that possibly I need to exit the form / let Selenium confirm that the autocomplete has been selected.
You can use WebDriverWait to wait for an element to be located in the new form. This way you wouldn't have to explicitly declare a delay functionality and the code will resume execution once the new form becomes present.
Sample code
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
driver.find_element_by_xpath("//body").click()
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "new form is ready!"
# add your code here
except TimeoutException:
print "Loading took too much time!"
You can also use any other type of method to find the locator.

Can't get rid of hardcoded delay even when Explicit Wait is already there

I've written some code in python in combination with selenium to parse the different questions from quora.com. My scraper is doing it's job at this moment. The thing is I've used here hardcoded delay for the scraper to work, even when Explicit Wait has already been defined. As the page is an infinite scrolling one, i tried to make the scrolling process to a limited number. Now, I have got two questions:
Why wait.until(EC.staleness_of(page)) is not working within my scraper. It is commented out now.
If i use something else instead of page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link"))) the scraper throws an error: can't focus element.
Btw, I do not wish to go for page = driver.find_element_by_tag_name('body') this option.
Here is what I've written so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
for scroll in range(10):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
# wait.until(EC.staleness_of(page))
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
You can try below code to get as much XHR as possible and then parse the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
links_counter = len(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "question_link"))))
while True:
page.send_keys(Keys.END)
try:
wait.until(lambda driver: len(driver.find_elements_by_class_name("question_link")) > links_counter)
links_counter = len(driver.find_elements_by_class_name("question_link"))
except TimeoutException:
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
Here we scroll page down and wait up to 10 seconds for more links to be loaded or break the while loop if the number of links remains the same
As for your questions:
wait.until(EC.staleness_of(page)) is not working because when you scroll page down you don't get the new DOM - you just make XHR which adds more links into existed DOM, so the first link (page) will not be stale in this case
(I'm not quite confident about this, but...) I guess you can send keys only to nodes that can be focused (user can set focus manually), e.g. links, input fields, textareas, buttons..., but not content division (div), paragraphs (p), etc

Scraper doesn't stop clicking on the next page button

I've written a script in python in combination with selenium to get some names and corresponding addresses displayed upon a search and the search keyword is "Saskatoon". However, the data, in this case, traverse multiple pages. My script almost does everything except for one thing.
It still runs even though there are no more pages to traverse. The last page also holds ">" sign for next page option and is not grayed out.
Here is the link: Page_link
Search_keyword: Saskatoon (in the city/town field).
Here is what I've written:
from selenium import webdriver; import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("above_link")
time.sleep(3)
search_input = driver.find_element_by_id("cityField")
search_input.clear()
search_input.send_keys("Saskatoon")
search_input.send_keys(Keys.ENTER)
while True:
try:
wait.until(EC.visibility_of_element_located((By.LINK_TEXT, "›"))).click()
time.sleep(2)
except:
break
driver.quit()
BTW, I've just taken out the name and address part form this script which I suppose is not relevant here. Thanks.
You can use class attribute of > button as on last page it is "ng-scope disabled" while on rest pages - "ng-scope":
wait.until(EC.visibility_of_element_located((By.XPATH, "//li[#class='ng-scope']/a[.='›']"))).click()

Categories

Resources