How to wait for the page to fully load? - python

I try to webscrape linkedin sales navigator but the page loads other elements only if i scroll down and wait a bit. I tried to execute this a following way but it just added me 3 more elements so I get 6 out of 25.
profile_url = "https://www.linkedin.com/sales/search/people?query=(recentSearchParam%3A(doLogHistory%3Atrue)%2Cfilters%3AList((type%3AREGION%2Cvalues%3AList((id%3A102393603%2Ctext%3AAsia%2CselectionType%3AINCLUDED)))))&sessionId=fvwzsZ8CTAKdGri5T5mYZw%3D%3D"
driver.get(profile_url)
time.sleep(20)
action = webdriver.ActionChains(driver)
to_scroll = driver.find_element(By.XPATH, "//button[#id='ember105']")
to_scroll_up = driver.find_element(By.XPATH, "//button[#id='ember141']")
action.move_to_element(to_scroll)
action.perform()
time.sleep(3)
action.move_to_element(to_scroll_up)
action.perform()
time.sleep(3)
action.move_to_element(to_scroll)
action.perform()
time.sleep(3)
How to solve it?

You can do this by using Explicit Wait. As it will make sure the element is visible on UI. You can identify an element that gets loaded after the page is completely loaded and replace XPATH_of_element with its XPATH. Also, add this code before extracting the data so it works properly.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 15).until(EC.visibility_of_element_located((By.XPATH, 'XPATH_of_element')))
You can visit this blog by Shreya Bose for more information.

Related

Python Selenium iterate table of links clicking each link

So this question has been asked before but I am still struggling to get it working.
The webpage has a table with links, I want to iterate through clicking each of the links.
So this is my code so far
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "table-scroll")))
table = element.find_elements_by_xpath("//table//tbody/tr")
for row in table[1:]:
print(row.get_attribute('innerHTML'))
# link.click()
finally:
driver.close()
Sample of output
<td>FOUR</td>
<td>4imprint Group plc</td>
<td>Media & Publishing</td>
<td>888</td>
<td>888 Holdings</td>
<td>Hotels & Entertainment Services</td>
<td>ASL</td>
<td>Aberforth Smaller Companies Trust</td>
<td>Collective Investments</td>
How do a click the href and iterate to the next href?
Many thanks.
edit
I went with this solution (a few small tweaks on Prophet's solution)
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
try:
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(2)
except StaleElementReferenceException:
pass
To perform what you want to do here you first need to close cookies banner on the bottom of the page.
Then you can iterate over the links in the table.
Since by clicking on each link you are opening a new page, after scaring the data there you will have to get back to the main page and get the next link. You can not just get all the links into some list and then iterate over that list since by navigating to another web page all the existing elements grabbed by Selenium on the initial page become Stale.
Your code can be something like this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(1)
You basically have to do the following:
Click on the cookies button if available
Get all the links on the page.
Iterate over the list of links and then click on the first (by first scrolling to the web element and doing that for the list item) and then navigate back to the original screen.
Code:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.fidelity.co.uk/shares/ftse-350/")
try:
wait.until(EC.element_to_be_clickable((By.ID, "ensCloseBanner"))).click()
print('Click on the cookies button')
except:
print('Could not click on the cookies button')
pass
driver.execute_script("window.scrollTo(0, 750)")
try:
all_links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table//tbody/tr/td/a")))
print("We have got to deal with", len(all_links), 'links')
j = 0
for link in range(len(all_links)):
links = wait.until(EC.presence_of_all_elements_located((By.XPATH, f"//table//tbody/tr/td/a")))
driver.execute_script("arguments[0].scrollIntoView(true);", links[j])
time.sleep(1)
links[j].click()
# here write the code to scrape something once the click is performed
time.sleep(1)
driver.execute_script("window.history.go(-1)")
j = j + 1
print(j)
except:
print('Bot Could not exceute all the links properly')
pass
Import:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS to handle stale element reference you'd have to define the list of web elements again inside the loop.

Unable to Populate Search Field in Python Selenium

I am trying to search www.oddschecker.com using Python and Selenium (chromedriver) but am struggling to populate the search field with the search term.
The code below clicks the magnifying glass ok but the send_keys statement does not populate the expanded search box.
Can anyone see the issue?
driver.find_element_by_xpath('//*[#id="top-menu"]/li[8]/ul/li[1]/span/span').click()
sleep(5)
driver.find_element_by_xpath('/html/body/div[1]/header/div[1]/div/div[1]/div/ul/li[8]/ul/li[1]/div/div/div/form/fieldset/input').send_keys('Bicep')
xpath is brittle, please use relative xpath. see below.
Use explicit waits.
Close modal pop up may appear sometime, may not appear sometime so better to wrap them inside try and except block.
Code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
driver.get("http://www.oddschecker.com/")
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.js-close-class"))).click()
print('Clicked on close button if it appears')
except:
pass
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[starts-with(#data-ng-click,'NavSearchCtrl')]"))).click()
search = wait.until(EC.visibility_of_element_located((By.ID, "search-input")))
search.send_keys('Bicep', Keys.RETURN)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium crashes when I'm trying to parse the next page (and seven after it) on a website. Any way to tackle this?

I want to parse an IMDb film rating located here on around 8 pages. In order to do that I'm using Selenium, and I'm having trouble with clicks, proceeding algorithm to next page. In the end I need 1000 titles when I'll continue using BeautifulSoup. Code below isn't working, I need to use button 'NEXT' with this HTML:
<a class="flat-button lister-page-next next-page" href="/list/ls000004717/?page=2">
Next
</a>
This is the code:
from selenium import webdriver as wb
browser = wb.Chrome()
browser.get('https://www.imdb.com/list/ls000004717/')
field = browser.find_element_by_name("flat-button lister-page-next next-page").click()
Error is the following:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".flat-button lister-page-next next-page"}
(Session info: chrome=78.0.3904.108)
I suppose I lack knowledge of syntax needed, or maybe I mixed it up a little. I tried searching on SO, though every example is pretty unique and I don't possess the knowledge to extrapolate these cases fully. Any way Selenium can handle that?
You could try using an XPath to query on the Next text inside the button. You should also probably invoke WebDriverWait since you are navigating across multiple pages, then scroll into view since this is at the bottom of the page:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from time import sleep
browser = wb.Chrome()
browser.get('https://www.imdb.com/list/ls000004717/')
# keep clicking next until we reach the end
for i in range(0,9):
# wait up to 10s before locating next button
try:
next_button = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class, 'page') and contains(text(), 'Next')]")))
# scroll down to button using Javascript
browser.execute_script("arguments[0].scrollIntoView(true);", next_button)
# click the button
# next_button.click() this throws exception -- replace with JS click
browser.execute_script("arguments[0].click();", next_button)
# I never recommend using sleep like this, but WebDriverWait is not waiting on next button to fully load, so it goes stale.
sleep(5)
# case: next button no longer exists, we have reached the end
except TimeoutException:
break
I also wrapped everything in a try / except TimeoutException block to handle the case where we have reached the end of pages, and Next button no longer exists, thus breaking out of the loop. This worked on multiple pages for me.
I also had to add an explicit sleep(5) because even after invoking WebDriverWait on element_to_be_clickable, next_button was still throwing StaleElementReferenceException. It seems like WebDriverWait was finishing before page was fully loaded, causing the status of next_button to change after it had been located. Normally adding sleep(5) is bad practice, but there did not seem to be another workaround here. If anyone else has a suggestion on this, feel free to comment / edit the answer.
There are a couple of ways that could work:
1. Use a selector for the next button and loop until the end:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
browser = webdriver.Chrome()
browser.get('https://www.imdb.com/list/ls000004717/')
selector = 'a[class*="next-page"]'
num_pages = 10
for page in range(pages):
# Wait for the element to load
WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.CSS_SELECTOR, selector)))
# ... Do rating parsing here
browser.find_element_by_css_selector(selector).click()
Instead of clicking on the element, the other option could be to navigate to the next page using broswer.get('...'):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
# Set up browser as before and navigate to the page
browser = webdriver.Chrome()
browser.get('https://www.imdb.com/list/ls000004717/')
selector = 'a[class*="next-page"]'
base_url = 'https://www.imdb.com/list/ls000004717/'
page_extension = '?page='
# Already at page = 1, so only needs to loop 9 times
for page in range(2, pages + 1):
# Wait for the page to load
WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.CSS_SELECTOR, selector)))
# ... Do rating parsing here
next_page = base_url + page_extension + str(page)
browser.get(next_page)
As a note: field = browser.find_element_by_name("...").click() will not assign field to a webelement, as the click() method has no return value.
You could try a partial css selector.
browser.find_element_by_css_selector("a[class*='next-page']").click()
To click on the element with text as NEXT till the 901 - 1,000 of 1,000 page you have to:
scrollIntoView() the element once the visibility_of_element_located() is achieved.
Induce WebDriverWait for the element_to_be_clickable()
You can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.imdb.com/list/ls000004717/')
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagination-range"))))
while True:
try:
WebDriverWait(driver, 20).until(EC.invisibility_of_element((By.CSS_SELECTOR, "div.row.text-center.lister-working.hidden")))
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagination-range"))))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.flat-button.lister-page-next.next-page"))).click()
print("Clicked on NEXT button")
except TimeoutException as e:
print("No more NEXT button")
break
driver.quit()
Console Output:
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
Clicked on NEXT button
No more NEXT button

Selinum Driver wait for SVG to be completely rendred

I'm using Selenium with Chrome driver to scrap pages that contain SVG .
I need a way to make Selenium wait until the svg is completely loaded otherwise I will get some incomplete charts when I scrap.
For the moment the script wait for 10sec before it start scrapping but that's is a lot for scraping 20000 pages .
def page_loaded(driver):
path = "//*[local-name() = 'svg']"
time.sleep(10)
return driver.find_element_by_xpath(path)
wait = WebDriverWait(self.driver, 10)
wait.until(page_loaded)
is there any efficient way to check if the SVG is loaded before starting to scrap?
An example from Selenium documentation:
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID, 'someid')))
So in your case it should be :
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(self.driver, 10)
element = wait.until(EC.presence_of_element_located((By.XPATH, path)))
Here 10 in WebDriverWait(driver, 10) is the maximum seconds of wait. ie it waits until 10 or condition whichever is first.
Some common conditions that are frequently of use when automating web browsers:
title_is title_contains
presence_of_element_located
visibility_of_element_located visibility_of
presence_of_all_elements_located
text_to_be_present_in_element
text_to_be_present_in_element_value
etc.
More available here.
Also here's the documentation for expected conditions support.
Another way you can tackle this is write your on method like:
def find_svg(driver):
element = driver.find_element_by_xpath(path)
if element:
return element
else:
return False
And then call Webdriver wait like:
element = WebDriverWait(driver, max_secs).until(find_svg)

Unable to use "explicit wait" in the right way

I've written a script using python with selenium to click on some links listed in the sidebar of google maps. When any of the items get clicked, the related information attached to each lead shows up in the right sided area. The script is doing fine. However, I've used hardcoded delay to do the job. How can I get rid of hardcoded delay by achieving the same with explicit wait. Thanks in advance.
Link to the site: website
The script I'm trying with:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "replace_with_above_link"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
item.location
time.sleep(3) #wish to try with explicit wait but can't find any idea
item.click()
driver.quit()
I tried with wait.until(EC.staleness_of(item)) instead of hardcoded delay but no luck.
If you want to wait until new data displayed after each clcik you may try below:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
div = driver.find_element_by_xpath("//div[#class='xpdopen']")
item.location
item.click()
wait.until(EC.staleness_of(div))

Categories

Resources