I am having a terribly hard time referencing to a certain "next page" button on a website that I am trying to scrape links from [https://www.sreality.cz/adresar?strana=2]. If you scroll down you can see a red right arrow button that you can click to go to the next page and so the website load new dynamic content. Every approach seems to report the same exact error and I don't know how am I supposed to point to the element without running into it.
This is the code that I currently have :
from selenium import webdriver
chromedriver_path = "/home/user/Dokumenty/iCloud/RealityScraper/chromedriver"
driver = webdriver.Chrome(chromedriver_path)
print("WebDriver Successfully Initialized")
driver.get("https://www.sreality.cz/adresar?strana=2")
links = driver.find_elements_by_css_selector("h2.title a")
nextPage = driver.find_element_by_css_selector("li.paging-item a.btn-paging-pn.icof.icon-arr-right.paging-next")
for link in links:
print(link.get_attribute("href"))
nextPage.click()
The "nextPage" variable is holding a supposed value to be clicked on once the "links" variable search finishes scraping all the links from the company titles. However when I run this code I get an error :
selenium.common.exceptions.StaleElementReferenceException: Message:
stale element reference: element is not attached to the page document
I have been searching for various fixes online but none of them seemed to resolve the issue. I think that the issue at this point is not caused by the element not loading quickly enough but rather Selenium having trouble finding the element because of wrong reference.
Because of this I have tried using XPath to accurately point to the actual element and so I changed the "nextPage" variable to :
nextPage = driver.find_element_by_xpath("""/html/body/div[2]/div[1]/div[2]/div[2]/div[4]/div/div/div/div[2]/div/div[2]/ul[1]/li[12]/a""")
Which returns exactly the same error as stated above. I have been trying to find a solution to this for hours now and I can't understand where the issue lies. I would be grateful if anyone could explain to me what am I doing wrong. Thanks to anyone.
If you want to get all the ng-href tags from every page. Or you could look into their api.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
driver.get("https://www.sreality.cz/adresar?strana=2")
wait = WebDriverWait(driver, 10)
while True:
try:
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h2.title > a")))
#print(len(links))
for link in links:
print(link.get_attribute("ng-href"))
nextPage = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.btn-paging-pn.icof.icon-arr-right.paging-next")))
nextPage.click()
time.sleep(10)
except Exception as e:
print(e)
break
First of all never use the absolute xpath it will breakdown easily, Use the relative xpath.
Secondly, i think the error you are getting is because after clicking "Next" button for the first time it loads a new page. Which has a different DOM structure and that's why you are not able to find that element.
You can try searching for the element after every new page load (after clicking "Next" button everytime.)
// imports
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
// initialize
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
action = ActionChains(driver)
// Try to use the below code and see if it works.
Next_btn = wait.until(EC.presence_of_element_located((By.XPATH, '(//li[#class="paging-item"])[2]')))
action.move_to_element(Next_btn).click().perform()
Related
Hoping you can help. I'm relatively new to Python and Selenium. I'm trying to pull together a simple script that will automate news searching on various websites. The primary focus was football and to go and get me the latest Manchester United news from a couple of places and save the list of link titles and URLs for me. I could then look through the links myself and choose anything I wanted to review.
In trying the the independent newspaper (https://www.independent.co.uk/) I seem to have come up against a problem with element not interactable when using the following approaches:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('chromedriver')
driver.get('https://www.independent.co.uk')
time.sleep(3)
#accept the cookies/privacy bit
OK = driver.find_element_by_id('qcCmpButtons')
OK.click()
#wait a few seconds, just in case
time.sleep(5)
search_toggle = driver.find_element_by_class_name('icon-search.dropdown-toggle')
search_toggle.click()
This throws the selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable error
I've also tried with XPATH
search_toggle = driver.find_element_by_xpath('//*[#id="quick-search-toggle"]')
and I also tried ID.
I did a lot of reading on here and then also tried using WebDriverWait and execute_script methods:
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="quick-search-toggle"]')))
driver.execute_script("arguments[0].click();", element)
This didn't seem to error but the search box never appeared, i.e. the appropriate click didn't happen.
Any help you could give would be fantastic.
Thanks,
Pete
Your locator is //*[#id="quick-search-toggle"], there are 2 on the page. The first is invisible and the second is visible. By default selenium refers to the first element, sadly the element you mean is the second one, so you need another unique locator. Try this:
search_toggle = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#class="row secondary"]//a[#id="quick-search-toggle"]')))
search_toggle.click()
First you need to open search box, then send search keys:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import os
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome(executable_path=os.path.abspath(os.getcwd()) + "/chromedriver", options=chrome_options)
link = 'https://www.independent.co.uk'
browser.get(link)
# accept privacy
button = browser.find_element_by_xpath('//*[#id="qcCmpButtons"]/button').click()
# open search box
li = browser.find_element_by_xpath('//*[#id="masthead"]/div[3]/nav[2]/ul/li[1]')
search_tab = li.find_element_by_tag_name('a').click()
# send keys to search box
search = browser.find_element_by_xpath('//*[#id="gsc-i-id1"]')
search.send_keys("python")
search.send_keys(Keys.RETURN)
Can you try with below steps
search_toggle = driver.find_element_by_xpath('//*[#class="row secondary"]/nav[2]/ul/li[1]/a')
search_toggle.click()
Trying got automate a task via Selenium python i have the issue where the for each section does work only the first time, after that does not see the second variable. Also tried to add delays so the webpage would be fully loaded but the same issue.
I tested different scenarios that i found in the internet also so manual tests i did, but looks like the second div is not recognizable also the rest of the divs
for server in browser.find_elements_by_xpath("//*[starts-with(#id,'server-list-')]"):
#try:
print("Server Section----")
time.sleep(5)
#Print server name
print(server.text)
#clicn on button inside the server
server.click()
#back into the server listing
browser.back()
Basically the automation need to enter every server ( div starting with id server-list- ) click on it, after entering on that section click another button and than back to the main page.
A stale element reference exception is thrown because of below reason
element has been deleted entirely.
element is no longer attached to the DOM.
Please check your element is still present on UI with the same element XPath which you are using while interacting with it
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//*[starts-with(#id,'server-list-')]"))
)
for i in range(len(element)):
element[i].click()
driver.back()
element = wait.until(EC.presence_of_element_located((By.XPATH, "//*[starts-with(#id,'server-list-')]")))
finally:
driver.quit()
With limited information provided, you can try code below:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
wait = WebDriverWait(browser, 10)
servers = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[starts-with(#id,'server-list-')]")))
servers_count = len(servers)
for i in range(servers_count):
print(servers[i].text)
servers[i].click()
browser.back()
servers = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[starts-with(#id,'server-list-')]")))
You capture the server list that no more exists after you have drilled into the particular list item. So when you get back to the list that old one does not exist any more hence all the items (including that one you expect to move next to) are stale.
You need to rework your logic so that you get the list of servers each time you get back from the server details and store somethere the flag that would let your script know which items you have already visited.
I've written a script using python with selenium to click on some links listed in the sidebar of google maps. When any of the items get clicked, the related information attached to each lead shows up in the right sided area. The script is doing fine. However, I've used hardcoded delay to do the job. How can I get rid of hardcoded delay by achieving the same with explicit wait. Thanks in advance.
Link to the site: website
The script I'm trying with:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "replace_with_above_link"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
item.location
time.sleep(3) #wish to try with explicit wait but can't find any idea
item.click()
driver.quit()
I tried with wait.until(EC.staleness_of(item)) instead of hardcoded delay but no luck.
If you want to wait until new data displayed after each clcik you may try below:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
div = driver.find_element_by_xpath("//div[#class='xpdopen']")
item.location
item.click()
wait.until(EC.staleness_of(div))
I've written a script in python in combination with selenium to parse names from a webpage. The data from that site is not javascript enabled. However, the next page links are within javascript. As the next page links of that webpage are of no use if I go for requests library, I have used selenium to parse the data from that site traversing 25 pages. The only problem I'm facing here is that although my scraper is able to reach the last page clicking through 25 pages, it only fetches the data from the first page only. Moreover, the scraper keeps running even though it has done clicking the last page. The next page links look exactly like javascript:nextPage();. Btw, the url of that site never changes even if I click on the next page button. How can i get all the names from 25 pages? The css selector I've used in my scraper is flawless. Thanks in advance.
Here is what I've written:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
while True:
for name in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.greygeneraltxt td.greygeneraltxt,td.lightbluebg"))):
print(name.text)
try:
n_link = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[href*='nextPage']")))
driver.execute_script(n_link.get_attribute("href"))
except: break
driver.quit()
You don't have to handle "Next" button or somehow change page number - all entries are already in page source. Try below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
for name in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.greygeneraltxt td.greygeneraltxt,td.lightbluebg"))):
print(name.get_attribute('textContent'))
driver.quit()
You can also try this solution if it's not mandatory for you to use Selenium:
import requests
from lxml import html
r = requests.get("https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false")
source = html.fromstring(r.content)
for name in source.xpath("//table[#class='greygeneraltxt']//td[text() and position()>1]"):
print(name.text)
It appears this can actually be done more simply than the current approach. After the driver.get method, you can simply use the page_source property to get the html behind it. From there you can get out data from all 25 pages at once. To see how it's structured, just right click and "view source" in chrome.
html_string=driver.page_source
What i am trying to do is make a simple program that lets me run and it basically goes to Torrentz and follows a few link to finally be able to download the file through uttorent. Below is what i have coded so far and i cant seem to make the variable linkElem work. And i also cant seem to make linkElem.find_elements_by_xpath go to the link necessary. If you think you know what is wrong, please do help.
Thanks.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get('https://torrentz.eu/')
searchElem = browser.find_element_by_id('thesearchbox')
searchElem.send_keys('Limitless')
searchButton = browser.find_element_by_id('thesearchbutton')
searchButton.click()
linkElem = linkElem.find_elements_by_xpath("//div[#class='results']//a[#href='/9ea0c575520a3065d85b285c9474231192368db7']")
#wait = WebDriverWait(browser, 6)
#linkElem = wait.until(EC.visibility_of_element_located((By.href, "/9ea0c575520a3065d85b285c9474231192368db7")))
#linkElem.clear()
#linkElem = browser.find_element_by_link_text('S01E20 HDTV x264 LOL ettv')
#linkElem.click()
#SignIn = browser.find_elements_by_id('signIn')
#SignIn.click()
#passwordElem.submit()
I don't think you can and should rely on the href attribute value. Instead, get the links from under the dl elements inside the search results container. Also, add a wait:
# wait for search results to appear
wait = WebDriverWait(browser, 6)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results dl")))
links = driver.find_elements_by_css_selector("div.results dl dt a")
links[0].click()
links in your case would contain all of the search results links, links[0] is the first link.