How to do pagination with scroll in Selenium?

How to do pagination with scroll in Selenium? - python

I need to do pagination for this page:
I read this question and I try this:
scrolls = 10
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
I need to scroll down for getting all the products, but I don't know how many time I need to scroll for getting all the products.
I also tried to have a big screen
'SELENIUM_DRIVER_ARGUMENTS': ['--no-sandbox', '--window-size=1920,30000'],
and scroll down
time.sleep(10)
self.driver.execute_script("window.scrollBy(0, 30000);")
Does someone have an Idea how to get all products ?
I'm open to another solution, if Selenium is not the best for this case.
Thanks.
UPDATE 1:
I need to have all product IDs. for having the product IDs I use this:
products = response.css('div.jfJiHa > .iepIep')
for product in products:
detail_link = product.css('a.jXwbaQ::attr("href")').get()
product_id = re.findall(r'products/(\d+)', detail_link)[0]

As commented, without seeing your whole spider it is hard to see where you are going wrong here, but if we assume that your parsing is using the scrapy response then that is why you are always just getting 30 products.
You need to create a new selector from the driver after each scroll and query that. A full example of code that gets 300 items from the page is
import re
import time
from pprint import pprint
import parsel
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import Firefox
with Firefox() as driver:
driver.get("https://www.compraonline.bonpreuesclat.cat/products/search?q=pasta")
all_items = {}
while True:
sel = parsel.Selector(driver.page_source)
for product in sel.css("div[data-test] h3 > a"):
name = product.css("::text").get()
product_id = re.search("(\d+)", product.attrib["href"]).group()
all_items[product_id] = name
try:
element = driver.find_element_by_css_selector(
"div[data-test] + div.iepIep:not([data-test])"
)
except NoSuchElementException:
break
driver.execute_script("arguments[0].scrollIntoView(true);", element)
time.sleep(1)
pprint(all_items)
print("Number of items =", len(all_items))
The key bits of this
After getting the page using driver.get we start looping
We create a new Selector (here I directly use parsel.Selector which is what scrapy uses internally)
We extract the info we need. Displayed products all have a data-test attribute. If this was a scrapy.Spider I'd yield the information, but here I just add it to a dictionary of all items.
After getting all the visible items, we try to find the first following sibling of a div with a data-test attribute , that doesn't have a data-test attribute (using the css + symbol)
If no such element exists (because we have seen all items) then break out of the loop, otherwise scroll that element into view and pause a second
Repeat until all items have been parsed

Try scrolling visible screen height amount page down each time reading the presented products until the //button[#data-test='footer-feedback-button'] or any other element located on the bottom is visible

This code may help -
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 30)
driver.get('https://www.compraonline.bonpreuesclat.cat/products/search?q=pasta')
BaseDivs = driver.find_elements_by_xpath("//div[contains(#class,\"base__Wrapper\")]")
for div in BaseDivs:
try:
wait.until(EC.visibility_of_element_located((By.XPATH, "./descendant::img")))
driver.execute_script("return arguments[0].scrollIntoView(true);", div)
except StaleElementReferenceException:
continue
This code will wait for the image to load and then focus on the element. This way it will automatically scroll down till the end of the page.
Mark it answer if this is what you are looking for.

I solved my problem but not with Selenium, We can have all the products of search by another request:
https://www.compraonline.bonpreuesclat.cat/api/v4/products/search?limit=1000&offset=0&sort=favorite&term=pasta

Related

How can I scrape a webpage that only loads it in increments

I am trying to count the number of items that contain the word "kudoed" from a particular webpage. Now, the webpage itself only loads a limited number of items initally and then requires a button to be pressed to load the rest. Please see the image below:
enter image description here
I wrote a selenium + beautiful soup code to do this. The reason why I had to use selenium is due to some proxy errors. Here is my full code so far:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Edge(executable_path = r"C:\Users\H\Desktop\Automated_Tasks\msedgedriver.exe") # Modify the path here...
# Navigate to URL
driver.get("https://powerusers.microsoft.com/t5/notificationfeed/page")
# Wait for the page to load
wait = WebDriverWait(driver, 10)
# Get all elements on the page
time.sleep(8)
click_button=driver.find_element("xpath", '/html/body/div[2]/center/div[4]/div/div/div/div[1]/div[3]/div/div/div/div/span/a').click()
element = driver.find_element("ID", 'viewMoreLink')
driver.execute_script("arguments[0].click();", element)
from bs4 import BeautifulSoup
# Get the page source
page_source = driver.page_source
# Create a BeautifulSoup object
soup = BeautifulSoup(page_source, 'html.parser')
items = soup.find_all("div", class_="lia-quilt-column-alley lia-quilt-column-alley-right")
count = 0
for item in items:
if "kudoed" in item.text:
count += 1
print(f"Number of items containing 'kudoed': {count}")
Is there a way for me to click the button without having to tell selenium to click the button, wait for the next items to load and repeat these steps until the entire list has been loaded?.
When it gets to the code:
click_button=driver.find_element("xpath", '/html/body/div[2]/center/div[4]/div/div/div/div[1]/div[3]/div/div/div/div/span/a').click()
I get the following error:
ElementClickInterceptedException: element click intercepted: Element is not clickable at point (476, 2184)
(Session info: MicrosoftEdge=109.0.1518.61)
I tried searching by ID and it still did not work. Here is the full HTML for the button:
enter image description here

Usually ElementClickInterceptedException means that you are trying to click an element not visible on the page, so before clicking it you have to scroll to it.
Is there a way for me to click the button without having to tell
selenium to click the button, wait for the next items to load and
repeat these steps until the entire list has been loaded?
I don't think so, but is not that hard to do the job:
items_old, items = [], []
while 1:
while len(items) == len(items_old):
items = WebDriverWait(driver, 9).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, 'lia-notification-feed-item')))
print(f'{len(items)=}')
show_more_btn = driver.find_elements(By.ID, 'viewMoreLink')
if show_more_btn:
print('load more items')
driver.execute_script('arguments[0].scrollIntoView({block: "center"});', show_more_btn[0])
time.sleep(2)
show_more_btn[0].click()
items_old = items.copy()
else:
print('all items loaded')
break
print(f"Number of items containing 'kudoed': {sum(['kudoed' in x.text for x in items])}")
Output
len(items)=25
load more items
len(items)=32
all items loaded
Number of items containing 'kudoed': 2

Why cant i iterate through a list in selenium?

I am trying to scrape the headers of wikipedia pages as an exercise, and i want to be able to distinguish between headers with "h2" and "h3" tags.
Therefore i wrote this code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
h1Header = driver.find_element_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_element_by_class_name("mw-headline")
driver.quit()
However, this does not work and the program does not terminate.
Anybody have a solution for this?
The problem here lies in the for loop! Apparently i can not scrape any elements by class name (or anything else) from the elements in h2HeaderTexts, although this should be possible.

You can filter in xpath iteself :
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title)
for item in driver.find_elements(By.XPATH, "//h2/span[#class='mw-headline']"):
print(item.text)
this should give you, h2 heading with class mw-headline elements.
output :
Informelle Beschreibung
Der Algorithmus
Implementierung
Optimierungen
Vergleich von Minimax und AlphaBeta
Geschichte
Literatur
Weblinks
Fußnoten
Process finished with exit code 0
Update 1 :
The reason why your loop is still running and program does not terminate, is cause if you look the page HTML source, and the first h2 tag, that h2 tag does not have a child span with mw-headline, so selenium is trying to locate the element which is not there in HTML DOM. also you are using find_elements which return a list of web elements if found, if not return an empty list, is the reason you do not see exception as well.

You should wait until elements appearing on the page before accessing them.
Also, there are several elements with h1 tag name too there.
To search for elements inside element you should use xpath starting with a dot. Otherwise this will search for the first match on the entire page.
The first h2 element on that page has no element with class name mw-headline inside it. So, you should handle this issue too.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 20)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
wait.until(EC.visibility_of_element_located((By.XPATH, "//h1")))
h1Headers = driver.find_elements_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_elements_by_xpath(".//span[#class='mw-headline']")
if scor:
#do what you need with scor[0] element
driver.quit()

You're version does not finish executing because selenium will drop the process if it could not locate an element.
Devs do not like to use try/catch but i personnaly have not found a better way to work around. If you do :
for items in h2HeaderTexts:
try:
scor = items.find_element_by_class_name('mw-headline').text
print(scor)
except:
print('nothing found')
You will notice that it will execute till the end and you have a result.

Convert output from Selenium in Python

I have this code:
from selenium import webdriver
# Set url and path
url = 'https://osu.ppy.sh/beatmapsets?m=0'
driver = webdriver.Chrome(executable_path=r"C:\Users\Gabri\anaconda3\chromedriver.exe")
driver.get(url)
# Select the item that i want
try:
dif = driver.find_element_by_css_selector('div.beatmapset-panel__beatmap-dot')
dif.text
print(dif)
except:
print('not found')
I'm trying to select this map difficulty 'expert in purple ' --> https://imgur.com/a/G224rka but i can't continue with my code cuz the output is "<selenium.webdriver.remote.webelement.WebElement (session="3cdaf38d0673d0aebe49733d629eae5c", element="60d6241b-80f7-42c8-bf38-9fd2c8574b08")>" and i expected that would be a string like "expert" or "--bg:var(--diff-expert);" How i can translate or convert ? I did try to select with '[class*="beatmapset-panel__beatmap-dot"' and the output is same. Someone can help me?

You need to change your code as following to print the element text:
from selenium import webdriver
# Set url and path
url = 'https://osu.ppy.sh/beatmapsets?m=0'
driver = webdriver.Chrome(executable_path=r"C:\Users\Gabri\anaconda3\chromedriver.exe")
driver.get(url)
# Select the item that i want
try:
dif = driver.find_element_by_css_selector('div.beatmapset-panel__beatmap-dot')
print(dif.text)
except:
print('not found')

You need to hover to an element, then wait for data to appear and get its text:
Here is the snippet for the first game from the list. To get all games you will need another loop.
I am using ActionChains to hover on element. Finding locators for this site was not easy even for me.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get("https://osu.ppy.sh/beatmapsets?m=0")
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".beatmapsets__items-row:nth-of-type(1)>.beatmapsets__item:nth-of-type(1)")))
games = driver.find_element_by_css_selector(".beatmapsets__items-row:nth-of-type(1) .beatmapsets__item:nth-of-type(1) .beatmapset-panel__info-row--extra")
actions = ActionChains(driver)
actions.move_to_element(games).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".beatmaps-popup__group")))
levels = driver.find_elements_by_css_selector(".beatmaps-popup__group .beatmaps-popup-item__col.beatmaps-popup-item__col--name.u-ellipsis-overflow")
for level in levels:
print(level.text)
Output:
Hinsvar's Hard
Zelqurre's Insane
Amamir's Shining Stars
For iteration through a list of levels use this css selector:
.beatmapsets__items-row:nth-of-type(1) .beatmapsets__item:nth-of-type(1) .beatmapset-panel__info-row--extra
And iterate this locator:
.beatmapsets__items-row:nth-of-type(1) .beatmapsets__item:nth-of-type(1) .beatmapset-panel__info-row--extra,
.beatmapsets__items-row:nth-of-type(2) .beatmapsets__item:nth-of-type(1) .beatmapset-panel__info-row--extra
Update:
To get scores use:
scores= driver.find_elements_by_css_selector(".beatmaps-popup__group .beatmaps-popup-item__col.beatmaps-popup-item__col--difficulty")
for score in scores:
print(score.text)
The output will be:
2.58
3.46
4.55
4.90
5.97
Also, check this answer on how to put results in one list: Trouble retrieving elements and looping pages using next page button
And finally read here about css selectors: https://www.w3schools.com/cssref/css_selectors.asp
I usually prefer using them because they are shorter.

How to web-scrape in for loop, without losing DOM? (Python, Selenium)

I`m trying to get data from polish Wiki-dictonary. Code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome()
driver.get("https://pl.wiktionary.org/wiki/Kategoria:J%C4%99zyk_polski_-_rzeczowniki")
page = driver.find_element_by_xpath('//*[#id="mw-pages"]/div/div')
words = page.find_elements_by_tag_name('li') #loading all the words
delay = 30
for word in words:
myElem = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[#id="mw-pages"]/a[2]')))
word.find_element_by_tag_name('a').click() #entering word
#COLLECTING DATA
driver.back()
# also tried with driver.execute_script("window.history.go(-1)") - same reasult
time.sleep(5) #added to make sure that time is not an obstacle
I get this error while trying to enter next word:
StaleElementReferenceException: stale element reference: element is not attached to the page document
(Session info: chrome=88.0.4324.190)

When you click you're changing the page which renders the previous elements stale.
So you need to either collect the pages you want to go to FIRST and step through them or you need to keep track of which element you're viewing and increment when you go back:
i = 0
for word in words:
myElem = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[#id="mw-pages"]/a[2]')))
word.find_elements_by_tag_name('a')[i].click() #entering word
#COLLECTING DATA
driver.back()
i++
# also tried with driver.execute_script("window.history.go(-1)") - same reasult
time.sleep(5) #added to make sure that time is not an obstacle
But, as you can find in stack overflow, there are ways to launch the link in a NEW window, switch_to that window, grab the data, then close that window and proceed onto the next link element.

Normally when we are working with ahref tags we get their href values and then loop and driver.get() them.
driver.get("https://pl.wiktionary.org/wiki/Kategoria:J%C4%99zyk_polski_-_rzeczowniki")
ahrefs= [x.get_attribute('href') for x in driver.find_elements_by_xpath('//*[#id="mw-pages"]/div/div//li/a')]
for ahref in ahrefs:
driver.get(ahref)

How can i wait till page reload [duplicate]

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.

The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.

Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.

Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.

As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.

From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))

On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True

Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.

Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue

Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))

How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"

You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")

use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()

Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass

I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)

If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")

selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.

nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to do pagination with scroll in Selenium? - python

Try scrolling visible screen height amount page down each time reading the presented products until the //button[#data-test='footer-feedback-button'] or any other element located on the bottom is visible

I solved my problem but not with Selenium, We can have all the products of search by another request: https://www.compraonline.bonpreuesclat.cat/api/v4/products/search?limit=1000&offset=0&sort=favorite&term=pasta

Related

How can I scrape a webpage that only loads it in increments

Why cant i iterate through a list in selenium?

Convert output from Selenium in Python

How to web-scrape in for loop, without losing DOM? (Python, Selenium)

How can i wait till page reload [duplicate]

Categories

Resources