Is there a way to set timeout when specific element in webpage loads
Notice: I am talking about loading webpage using driver.get() method
I tried setting page loads timeout to 10s for example and check whether element is present but if it is not present i'll have to load it from start.
Edit:
Clearly said that I don't want to load full url
I want driver.get() to load url until element found and then stop loading more from url
In your examples you used simply driver.get() method which will load full url and then execute next command. One way is to use driver.set_page_load_timeout()
Edit:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
op = Options()
op.page_load_strategy = 'none'
driver = webdriver.Chrome(options=op)
driver.get(URL)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, '//div')))
driver.execute_script("window.stop();")
PS. You will keep getting off-topic answers if you post vague questions without a line of code.
webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.
When the element is not found you get an error so you can use " try " untill the element is found:
while True:
try:
element = driver.find_element(by=By.<Your Choice>, value=<Name or Xpath or ...>)
except:
time.sleep(1)
It tries till the element is found
Related
I am having a terribly hard time referencing to a certain "next page" button on a website that I am trying to scrape links from [https://www.sreality.cz/adresar?strana=2]. If you scroll down you can see a red right arrow button that you can click to go to the next page and so the website load new dynamic content. Every approach seems to report the same exact error and I don't know how am I supposed to point to the element without running into it.
This is the code that I currently have :
from selenium import webdriver
chromedriver_path = "/home/user/Dokumenty/iCloud/RealityScraper/chromedriver"
driver = webdriver.Chrome(chromedriver_path)
print("WebDriver Successfully Initialized")
driver.get("https://www.sreality.cz/adresar?strana=2")
links = driver.find_elements_by_css_selector("h2.title a")
nextPage = driver.find_element_by_css_selector("li.paging-item a.btn-paging-pn.icof.icon-arr-right.paging-next")
for link in links:
print(link.get_attribute("href"))
nextPage.click()
The "nextPage" variable is holding a supposed value to be clicked on once the "links" variable search finishes scraping all the links from the company titles. However when I run this code I get an error :
selenium.common.exceptions.StaleElementReferenceException: Message:
stale element reference: element is not attached to the page document
I have been searching for various fixes online but none of them seemed to resolve the issue. I think that the issue at this point is not caused by the element not loading quickly enough but rather Selenium having trouble finding the element because of wrong reference.
Because of this I have tried using XPath to accurately point to the actual element and so I changed the "nextPage" variable to :
nextPage = driver.find_element_by_xpath("""/html/body/div[2]/div[1]/div[2]/div[2]/div[4]/div/div/div/div[2]/div/div[2]/ul[1]/li[12]/a""")
Which returns exactly the same error as stated above. I have been trying to find a solution to this for hours now and I can't understand where the issue lies. I would be grateful if anyone could explain to me what am I doing wrong. Thanks to anyone.
If you want to get all the ng-href tags from every page. Or you could look into their api.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
driver.get("https://www.sreality.cz/adresar?strana=2")
wait = WebDriverWait(driver, 10)
while True:
try:
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h2.title > a")))
#print(len(links))
for link in links:
print(link.get_attribute("ng-href"))
nextPage = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.btn-paging-pn.icof.icon-arr-right.paging-next")))
nextPage.click()
time.sleep(10)
except Exception as e:
print(e)
break
First of all never use the absolute xpath it will breakdown easily, Use the relative xpath.
Secondly, i think the error you are getting is because after clicking "Next" button for the first time it loads a new page. Which has a different DOM structure and that's why you are not able to find that element.
You can try searching for the element after every new page load (after clicking "Next" button everytime.)
// imports
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
// initialize
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
action = ActionChains(driver)
// Try to use the below code and see if it works.
Next_btn = wait.until(EC.presence_of_element_located((By.XPATH, '(//li[#class="paging-item"])[2]')))
action.move_to_element(Next_btn).click().perform()
driver.get(URL) function waits for the page to finish loading before returning to execution.
In Java there is a method driver.navigate().to(URL), which does not wait for the page to load, however, I cannot find such method in Python.
For firefox, I could run profile.set_preference("webdriver.load.strategy", "unstable"), however Chrome does not have such preference.
There is also driver.set_page_load_timeout(0.2) with TimeoutException handling
However, when the exception is handled, the chrome driver stops loading the page.
EDIT: Rephrased the question a bit and added one solution that does not work.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "none" # Do not wait for full page load
driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")
driver.get('some url')
#some code to run straight away
This code will not wait for the page to fully load before it moves to the next line. When using the 'None' Page load strategy you will probably have to make your own custom wait to check if the elements or element you need are done loading.
You could use:
import time
time.sleep()
Or if you want to only wait the required time before an element has loaded in:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
WebDriverWait(driver, timeout=10).until(
ec.visibility_of_element_located((By.ID, "your_element_id"))
)
This should help you with whatever you are trying to do. Since you gave no code, I can only assume this will work as I could not give a specific answer for a specific web page.
I would like to automatically download text files for ATS Blocks Download section on FINRA website. The problem is while I am able to click on the icon and open the file in the browser, I cannot get the page source after the click. driver.page_source returns the page source for the ATS Blocks Download section page (the one before the click).
Here is a piece of code I was trying out:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driver = webdriver.Chrome(ChromeDriverManager().install())
URL = 'https://otctransparency.finra.org/otctransparency/'
driver.get(URL)
# Agree to the general terms
driver.find_element_by_xpath('//*[#class="btn btn-warning"]').click()
#go to ATS Blocks Download section
driver.find_element_by_xpath('//*[#href="/otctransparency/AtsBlocksDownload"]').click()
#wait for the page to fully load
time.sleep(5)
#click on each download icon
for element in driver.find_elements_by_xpath('//*[#src="./assets/icon_download.png"]'):
element.click()
print(driver.page_source)
How to get the page source after every element.click()?
To get page_source of all the pages.
You need to
Induce WebDriverWait and element_to_be_clickable()
Induce WebDriverWait and visibility_of_all_elements_located()
Induce WebDriverWait and number_of_windows_to_be()
Code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
URL = 'https://otctransparency.finra.org/otctransparency/'
driver.get(URL)
driver.maximize_window()
# Agree to the general terms
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//*[#class="btn btn-warning"]'))).click()
#go to ATS Blocks Download section
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[#href="/otctransparency/AtsBlocksDownload"]'))).click()
#click on each download icon
elements=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,'//img[#src="./assets/icon_download.png"]')))
for link in range(len(elements)):
elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//img[#src="./assets/icon_download.png"]')))
elements[link].click()
WebDriverWait(driver,10).until(EC.number_of_windows_to_be(2))
windowhandles=driver.window_handles
driver.switch_to.window(windowhandles[-1])
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.TAG_NAME,"pre")))
print(driver.page_source)
driver.close()
driver.switch_to.window(windowhandles[0])
After every time that you click element to download other browser tab is opened, in order to get the page source from the other tab use:
for element in driver.find_elements_by_xpath('//[#src="./assets/icon_download.png"]'):
element.click()
driver.switch_to.window(driver.window_handles[1])
driver.set_page_load_timeout(120)
print(driver.page_source)
driver.switch_to.window(driver.window_handles[0])
driver.set_page_load_timeout(120)
PS. Instead of doing the:
time.sleep(5)
You can do:
driver.set_page_load_timeout(120)
Be sure not to mix various "waiting" mechanisms, as it can result in unexpected behavior (See this StackOverflow post for "why").
Be careful when using setting an implicit wait time, since once it is set, it is set for the lifetime of the driver instance (source, although it has been said in various places across the web).
If you intend to have your driver wait on multiple pages, you should use WebDriverWait. As shown in other replies, WebDriverWait(driver, timeout) accepts a WebDriver instance as well as an integer which represents the amount of time to wait before throwing an TimeoutException, in other words it accepts a timeout.
You can create a new WebDriverWait instance every time you're trying to find an element, without having to create a new WebDriver instance with a new implicit wait time. Since each element may need to waited on for a differing duration, this is ideal. You could go as far as to create a wrapper function to encapsulate the use of WebDriverWait:
def PatientlyClick(by, path, driver, timeout):
WebDriverWait(driver,timeout).until(EC.element_to_be_clickable((by, path))).click()
The above snippet of code could be made prettier if you designed a class which encapsulated your WebDriver instance, but that might be unnecessary for your purposes (see Page Object Model Design Pattern).
I've written a script in python in combination with selenium to parse names from a webpage. The data from that site is not javascript enabled. However, the next page links are within javascript. As the next page links of that webpage are of no use if I go for requests library, I have used selenium to parse the data from that site traversing 25 pages. The only problem I'm facing here is that although my scraper is able to reach the last page clicking through 25 pages, it only fetches the data from the first page only. Moreover, the scraper keeps running even though it has done clicking the last page. The next page links look exactly like javascript:nextPage();. Btw, the url of that site never changes even if I click on the next page button. How can i get all the names from 25 pages? The css selector I've used in my scraper is flawless. Thanks in advance.
Here is what I've written:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
while True:
for name in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.greygeneraltxt td.greygeneraltxt,td.lightbluebg"))):
print(name.text)
try:
n_link = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[href*='nextPage']")))
driver.execute_script(n_link.get_attribute("href"))
except: break
driver.quit()
You don't have to handle "Next" button or somehow change page number - all entries are already in page source. Try below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
for name in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.greygeneraltxt td.greygeneraltxt,td.lightbluebg"))):
print(name.get_attribute('textContent'))
driver.quit()
You can also try this solution if it's not mandatory for you to use Selenium:
import requests
from lxml import html
r = requests.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
source = html.fromstring(r.content)
for name in source.xpath("//table[#class='greygeneraltxt']//td[text() and position()>1]"):
print(name.text)
It appears this can actually be done more simply than the current approach. After the driver.get method, you can simply use the page_source property to get the html behind it. From there you can get out data from all 25 pages at once. To see how it's structured, just right click and "view source" in chrome.
html_string=driver.page_source
I am trying to create a piece of code that takes an input from the user and searches for that input in his gmail account and then checks all the boxes from the sender which is the same as the input.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver import Remote
from selenium.webdriver.support.ui import WebDriverWait
import urllib
driver = webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
driver.get("http://www.gmail.com")
elem = driver.find_element_by_id('Email')
elem.send_keys("****************")
elem2 = driver.find_element_by_id('next')
elem2.send_keys(Keys.ENTER)
driver.maximize_window()
driver.implicitly_wait(20)
elem3 = driver.find_element_by_name('Passwd')
elem3.send_keys("*************")
driver.find_element_by_id('signIn').send_keys(Keys.ENTER)
driver.implicitly_wait(20)
inp = 'randominput'
driver.find_element_by_name('q').send_keys(inp)
driver.find_element_by_css_selector('button.gbqfb').send_keys(Keys.ENTER)
x = driver.current_url
for i in driver.find_elements_by_css_selector('.zA.zE'):
print(i.find_element_by_class_name('zF').get_attribute('name'))
if(i.find_element_by_class_name('zF').get_attribute('name') == inp):
i.find_element_by_css_selector('.oZ-jc.T-Jo.J-J5-Ji').click()
This main problem is that although the webdriver shows the new page where it has searched for the query but when the code interacts with the page it interacts with the previous one.
I have tried putting implicit wait. And when I check for the current url it shows the new url.
The problem is that this is a single page app so selenium isn't going to wait for new data to load. You need to figure out a way to wait for the search results to come back. Whether it be based off of the 'Loading...' that appears at the top or even just waiting for the first result to change.
Grab an element off of the first page and wait for it to go stale. That will tell you that the page is loading. Then wait for the element you want. This is about the only way to ensure that the page has reloaded and you aren't referencing the original page with a dynamic page.
To wait for stale, see staleness_of on http://selenium-python.readthedocs.io/waits.html.