explicit waits in selenium and phantomjs with python - python

I'm trying to scrape some data off of this site, and many other "wines" on this site, and am using selenium to do so as its a JS site. however, I'm finding that my code only sometimes works and other times it does not return any values even though nothing is changing.
I think I should use explicit waits with selenium to overcome this challenge, however I'm not sure how to integrate them, so any guidance on doing so would be helpful!
my code is
def ct_content(url):
browser = webdriver.PhantomJS()
browser.get(url)
wait = WebDriverWait(driver, 10)
html = browser.page_source
html = lxml.html.fromstring(html)
try:
content = html.xpath('//a[starts-with(#href, "list.asp?Table=List")]/text()')
browser.quit()
return content
except:
browser.quit()
return False
Thanks!

Try to use more specific XPath:
//ul[#class="twin_set_list"]//a/text()
Also there is no need to use lxml. Simply try:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
data = [link.get_attribute('textContent') for link in wait(browser, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//ul[#class="twin_set_list"]//a')))]

It looks like you never actually use the implicit wait. This is how I would write script with an explicit wait.
def ct_content(url):
browser = webdriver.PhantomJS()
browser.get(url)
wait = WebDriverWait(browser, 10)
try:
content = wait.until(EC.element_to_be_clicable((By.XPATH, '//a[starts-with(#href, "list.asp?Table=List")]')))
browser.quit()
return content.text
except:
browser.quit()
return False
Also, the way to set implicit waits is:
browser.implicitly_wait(10) # seconds

Related

How to get only links that has a particular id from list of links using selenium

I am new to the selenium framework and I must say it is an awesome library. I am basically trying to get all links from a webpage that has a particular id "pagination", and isolate them from links that don't have such id, reasons because I want to go through all the pages in this link.
for j in browser.find_elements(By.CSS_SELECTOR, "div#col-content > div.main-menu2.main-menu-gray strong a[href]"):
print(j.get_property('href')))
The code above gets all the links with and without pagination.
example links with pagination.
https://www.oddsportal.com/soccer/africa/africa-cup-of-nations-2015/results/
https://www.oddsportal.com/soccer/england/premier-league-2020-2021/results/
https://www.oddsportal.com/soccer/africa/africa-cup-of-nations-2021/results/
https://www.oddsportal.com/soccer/africa/africa-cup-of-nations-2019/results/
example links without pagination.
https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/results/
In my code, I try to find if the given ID exists on the page, pagination = browser.find_element(By.ID, "pagination") but I stumble on an error, I understand the reason for the error, and it is coming from the fact that the ID "pagination" does not exist on some of the links.
no such element: Unable to locate element: {"method":"css selector","selector":"[id="pagination"]"}
I changed the above code to pagination = browser.find_elements(By.ID, "pagination"), which returns links with and without pagination. so my question is how can I get links that has a particular id from list of links.
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
import time
import tqdm
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#define our URL
url = 'https://oddsportal.com/results/'
path = r'C:\Users\Glodaris\OneDrive\Desktop\Repo\Scraper\chromedriver.exe'
options = ChromeOptions()
options.headless = True
# options=options
browser = Chrome(executable_path=path, options=options)
browser.get(url)
title = browser.title
print('Title', title)
links = []
for i in browser.find_elements(By.CSS_SELECTOR, "div#archive-tables tbody tr[xsid='1'] td a[href]"):
links.append(i.get_property('href'))
arr = []
condition = True
while condition:
for link in (links):
second_link = browser.get(link)
for j in browser.find_elements(By.CSS_SELECTOR, "div#col-content > div.main-menu2.main-menu-gray strong a[href]"):
browser.implicitly_wait(2)
pagination = browser.find_element(By.ID, "pagination")
if pagination:
print(pagination.get_property('href')))
else:
print(j.get_property('href')))
try:
browser.find_elements("xpath", "//*[#id='pagination']/a[6]")
except:
condition = False
As you are using Selenium, you are able to actually click on the pagination's forward button to navigate through pages.
The following example will test for cookie button, will scrape the data from the main table as a dataframe, will check if there is pagination, and if not, will stop there. If there is pagination, will navigate to next page, get the data from the table, navigate to the next page and so on, until the table data from the page is identical with table data from previous page, and then will stop. It is able to handle an n number of pages. The setup in the code below is for linux, what you need to pay attention to is the imports part, as well as the part after you define the browser/driver.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
# url='https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/results/'
url = 'https://www.oddsportal.com/soccer/africa/africa-cup-of-nations-2021/results/'
browser.get(url)
try:
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.ID, "onetrust-reject-all-handler"))).click()
except Exception as e:
print('no cookie button!')
games_table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table[id='tournamentTable']")))
try:
initial_games_table_data = games_table.get_attribute('outerHTML')
dfs = pd.read_html(initial_games_table_data)
print(dfs[0])
except Exception as e:
print(e, 'Unfortunately, no matches can be displayed because there are no odds available from your selected bookmakers.')
while True:
browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
t.sleep(1)
try:
forward_button = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#id='pagination']//span[text()='ยป']")))
forward_button.click()
except Exception as e:
print(e, 'no pagination, stopping here')
break
games_table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table[id='tournamentTable']")))
dfs = pd.read_html(games_table.get_attribute('outerHTML'))
games_table_data = games_table.get_attribute('outerHTML')
if games_table_data == initial_games_table_data:
print('this is the last page')
break
print(dfs[0])
initial_games_table_data = games_table_data
print('went to next page')
t.sleep(3)
You are seeing the error message...
no such element: Unable to locate element: {"method":"css selector","selector":"[id="pagination"]"}
...as all the pages doesn't contain the element:
<div id="pagination">
<a ...>
<a ...>
<a ...>
</div>
Solution
In these cases your best approach would be to wrapup the code block with in a try-except{} block as follows:
for j in browser.find_elements(By.CSS_SELECTOR, "div#col-content > div.main-menu2.main-menu-gray strong a[href]"):
try:
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "pagination")))
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#pagination a[href*='page']")))])
except:
print("Pagination not available")
continue
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Update
A couple of things to note.
The (By.ID, "pagination") element doesn't have a href attribute but the several decendants have. So you may find conflicting results.
As you are using WebDriverWait remember to remove all the instances of implicitly_wait() as mixing implicit and explicit waits can cause unpredictable wait times. For example setting an implicit wait of 10 seconds and an explicit wait of 15 seconds, could cause a timeout to occur after 20 seconds.

How Do I Search And Select First Option From website in selenium python

I have this code which goes to https://xiaomifirmwareupdater.com/miui/ , searches query and selects first element and downloads the rom, but its always selecting the same element despite of different query, I first thought website was giving same top result, but I checked with my browser but its giving different results, How can I fix / do this?
My Code :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
import asyncio
GOOGLE_CHROME_BIN = 'path here'
CHROME_DRIVER = 'driver path here'
async def bruh(query="Redmi Note 8 Pro China"):
url = "https://xiaomifirmwareupdater.com/miui"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = GOOGLE_CHROME_BIN
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(executable_path=CHROME_DRIVER, options=chrome_options)
driver.get(url)
await asyncio.sleep(10)
w = WebDriverWait(driver, 20)
search_xpath = '/html/body/div[3]/section/div[2]/div[3]/div[1]/div/div[2]/div[1]/div[2]/div/label/input'
next_page_url_xpath = '/html/body/div[3]/section/div[2]/div[3]/div[1]/div/div[2]/div[2]/div/table/tbody/tr[1]/td[8]/a'
version_xpath = '/html/body/div[3]/section/div[2]/div[2]/div[2]/div[1]/div/ul/li[3]/h5'
name_xpath = '/html/body/div[3]/section/div[2]/div[2]/div[2]/div[1]/div/ul/li[8]/h5/span'
w.until(expected_conditions.presence_of_element_located((By.XPATH, search_xpath)))
elem = driver.find_element_by_xpath(search_xpath)
elem.send_keys(query)
await asyncio.sleep(20)
next_page_elem = driver.find_element_by_xpath(next_page_url_xpath)
nextm = next_page_elem.get_attribute('href')
driver.get(nextm)
await asyncio.sleep(10)
version_elem = driver.find_element_by_xpath(version_xpath).text
name_elem = driver.find_element_by_xpath(name_xpath).text
version_elem = version_elem.replace("Version: ", "")
print(version_elem)
print(name_elem)
url = f"https://bigota.d.miui.com/{version_elem}/{name_elem}"
print(url)
driver.close()
I want to visit website, send my query and select first option and convert it to download able url. can anyone help? Thank You
You are doing a lot of extra stuff that I don't quite follow.
A few pieces of feedback...
Waiting for presence just means that the element is in the DOM, not that it's visible or interactable. If you are going to use click or send keys, you need to wait until clickable or visible, respectively, or you may get an exception.
You don't need all those sleeps, especially when you are using WebDriverWait. Best practice is to avoid sleeps (they make your script slower and less reliable), instead use WebDriverWait.
The wait for an element will return that element so you don't need to wait, then find, then click... you can just wait.until(...).click().
You were getting the href from a link and then navigating to it... just click the link.
Instead of replace(), I just used split()... either is fine. I think split() is less likely to break, e.g. if they change the labels.
Updating your code based on my feedback above, this should work.
driver.get(url)
wait = WebDriverWait(driver, 20)
wait.until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, "input"))).send_keys(query)
wait.until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR, "#miui td > a"))).click()
version = wait.until(expected_conditions.visibility_of_element_located((By.XPATH, "//h5[./b[text()='Version: ']]"))).text.split()[1]
package_name = wait.until(expected_conditions.visibility_of_element_located((By.ID, "filename"))).text
print(version)
print(package_name)
url = f"https://bigota.d.miui.com/{version}/{package_name}"
print(url)
driver.close()

Selinum Driver wait for SVG to be completely rendred

I'm using Selenium with Chrome driver to scrap pages that contain SVG .
I need a way to make Selenium wait until the svg is completely loaded otherwise I will get some incomplete charts when I scrap.
For the moment the script wait for 10sec before it start scrapping but that's is a lot for scraping 20000 pages .
def page_loaded(driver):
path = "//*[local-name() = 'svg']"
time.sleep(10)
return driver.find_element_by_xpath(path)
wait = WebDriverWait(self.driver, 10)
wait.until(page_loaded)
is there any efficient way to check if the SVG is loaded before starting to scrap?
An example from Selenium documentation:
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID, 'someid')))
So in your case it should be :
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(self.driver, 10)
element = wait.until(EC.presence_of_element_located((By.XPATH, path)))
Here 10 in WebDriverWait(driver, 10) is the maximum seconds of wait. ie it waits until 10 or condition whichever is first.
Some common conditions that are frequently of use when automating web browsers:
title_is title_contains
presence_of_element_located
visibility_of_element_located visibility_of
presence_of_all_elements_located
text_to_be_present_in_element
text_to_be_present_in_element_value
etc.
More available here.
Also here's the documentation for expected conditions support.
Another way you can tackle this is write your on method like:
def find_svg(driver):
element = driver.find_element_by_xpath(path)
if element:
return element
else:
return False
And then call Webdriver wait like:
element = WebDriverWait(driver, max_secs).until(find_svg)

How can i wait till page reload [duplicate]

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.
The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.
Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.
Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.
As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.
From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True
Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.
Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue
Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))
How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"
You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")
use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()
Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass
I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)
If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")
selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.
nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

Python Selenium Explict wait doen not raise TimeoutException

browser = webdriver.PhantomJS()
def get_score(url):
browser.get(url)
elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//span[#class='shopdsr-score-con']")))
description_score = float(elements[0].text)
service_score = float(elements[1].text)
logistics_score = float(elements[2].text)
About is my code. Sometimes this may get stuck for quite long time and I expect there should be TimeOutException if it get stuck for 30 seconds because I have this code wait = WebDriverWait(browser, 30)
But this never happens, the program just wait there...What's the question?
As you mentioned, you have set WebDriverWait(browser, 30) so effectively your code will be looking like :
browser = webdriver.PhantomJS()
def get_score(url):
browser.get(url)
elements = WebDriverWait(browser, 30).until(EC.presence_of_all_elements_located((By.XPATH, "//span[#class='shopdsr-score-con']")))
description_score = float(elements[0].text)
service_score = float(elements[1].text)
logistics_score = float(elements[2].text)
Logically, there is no error in your code block but the the program just wait there because when you invoke get(url) the web client i.e. PhantomJS Browser doesn't returns back document.readyState = "complete" so early. The JavaScript and the AJAX Calls still keeps loading. Hence Page Loading gets elongated.
Once the document.readyState = "complete" is returned by the web client i.e. PhantomJS Browser then only Selenium executes the next line of code:
elements = WebDriverWait(browser, 30).until(EC.presence_of_all_elements_located((By.XPATH, "//span[#class='shopdsr-score-con']")))
Hence the program just wait there for some more time.
Update :
As per your comment you need to look at the options of pageLoadStrategy either setting pageLoadStrategy to eager or none as per this QA Don't wait for a page to load using Selenium in Python
use try,except
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
timeout=30
try:
element_present1 = EC.presence_of_element_located((By.XPATH, "//input[#name='username']"))
WebDriverWait(driver, timeout).until(element_present1)
except TimeoutException:
print "Timed out waiting for login page to load"

Categories

Resources