I'm running into the infamous StaleElementReferceExeption error with selenium. I've checked previous questions on the subject, and the common solution is to add and implicit.wait, explicit.wait, or time.sleep to give the website time to load. I've tried this, but I am still experiencing an error. Can anyone tell what the issue is
Here is my code:
links = driver.find_elements_by_css_selector("a.overline-productName")
time.sleep(2)
#finds pricing data of links on page
link_count = 0
for element in links:
links[link_count].click()
cents = driver.find_element_by_css_selector("span.cents")
dollar = driver.find_element_by_css_selector("span.dollar")
text_price = dollar.text + "." + cents.text
price = float(text_price)
print(price)
print(link_count)
driver.execute_script("window.history.go(-1)")
link_count = link_count + 1
time.sleep(5)
what am I missing?
You're storing your links in a list. The second you follow a link to another page, that set of links is stale. So the next iteration in your loop will attempt to click a stale link from the list.
Even if you go back in history as you do later, that original element reference is gone.
Your best bet is to loop through based on index, and find the links each time you return to the page.
Related
I want to collect data from website pages with Python and Selenium.
Website is news website, I have come to the page where links/different news articles are listed.
This is my code:
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
for element in all_links:
print(element.get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
link = element.click()# clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
#until now everything words, everything works for the first element
But I'm getting the error when I want to iterate trough second element.
So, I'm getting good results for the first element in the list, but then I get this:
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=92.0.4515.159)
I have tried to put time.sleep(4) pauses and to add driver.close() and to add driver.back() after each iteration but the error is the same.
What am I doing wrong?
You need to define the list of web elements once again, when you are inside the for loop.
Explanation :
See the exact problem here is, when you click on the first element, it will go that first page where you have the element, and when you come back using
driver.execute_script("window.history.go(-1)") the other elements becomes stale in nature (This is how selenium works), so we have to redefined them again in order to interact with them. Please see below for illustration :-
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
j = 0
for element in range(len(all_links)):
elements = driver.find_elements_by_tag_name('article.post a')
print(elements[j].get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
elements[j].click() # clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
time.sleep(1)
driver.execute_script("window.history.go(-1)")
# code to go back to previous page should be written here, something like, driver.execute_script("window.history.go(-1)") or if this works driver.back()
time.sleep(1)
j = j + 1
You are facing here with classic case of StaleElementReferenceException.
Initially you have picked a list of elements with
all_links = driver.find_elements_by_tag_name('article.post a')
But once you click the first link and being passed to another page previously picked references (pointers) to the web elements located on the initial web page become Stale since these elements no more presented on the new page.
So even if you will get back to the initial page these references are no more valid since they become stale.
To continue you will have to get the links again.
You can do this as following:
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
i = 0
for element in range(len(all_links)):
#get all the elements again
elements = driver.find_elements_by_tag_name('article.post a')
#get the i-th element from list and click it
link = elements[i].click() # clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
#get back to the previous page
driver.execute_script("window.history.go(-1)")
time.sleep(1)
#increase the counter
i = i + 1
Context
This is a repost of Get a page with Selenium but wait for element value to not be empty, which was Closed without any validity so far as I can tell.
The linked answers in the closure reasoning both rely on knowing what the expected text value will be. In each answer, it explicitly shows the expected text hardcoded into the WebDriverWait call. Furthermore, neither of the linked answers even remotely touch upon the final part of my question:
[whether the expected conditions] come before or after the page Get
"Duplicate" Questions
How to extract data from the following html?
Assert if text within an element contains specific partial text
Original Question
I'm grabbing a web page using Selenium, but I need to wait for a certain value to load. I don't know what the value will be, only what element it will be present in.
It seems that using the expected condition text_to_be_present_in_element_value or text_to_be_present_in_element is the most likely way forward, but I'm having difficulty finding any actual documentation on how to use these and I don't know if they come before or after the page Get:
webdriver.get(url)
Rephrase
How do I get a page using Selenium but wait for an unknown text value to populate an element's text or value before continuing?
I'm sure that my answer is not the best one but, here is a part of my own code, which helped me with similar to your question.
In my case I had trouble with loading time of the DOM. Sometimes it took 5 sec sometimes 1 sec and so on.
url = 'www.somesite.com'
browser.get(url)
Because in my case browser.implicitly_wait(7) was not enought. I made a simple for loop to check if the content is loaded.
some code...
for try_html in range(7):
""" Make 7 tries to check if the element is loaded """
browser.implicitly_wait(7)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
raw_data = soup.find_all('script', type='application/ld+json')
"""if SKU in not found in the html page we skip
for another loop, else we break the
tryes and scrape the page"""
if 'sku' not in html:
continue
else:
scrape(raw_data)
break
It's not perfect, but you can try it.
I have a problem with this particular webstite link to website
I'm trying to create a script that can go through all entries but with the condition that it has "memory" so it can continue from the page it last was on. That means I need to know current page number AND a direct url to that page.
Here is what I have so far:
current_page_el = driver.find_element_by_xpath("//ul[contains(#class, 'pagination')]/li[#class='disabled']/a")
current_page = int(current_page_el.text)
current_page_url = current_page_el.get_attribute("href")
That code will result with
current_page_url = 'javascript:void(0);'
Is there a way to get current url from sites like this? Also, when you click to get to the next page, link just remains the same like what I posted in the beginning.
recently I tried scraping, so this time i wanted to go from page to page until I get the final destination I want. Here's my code:
sub_categories = browser.find_elements_by_class_name("ty-menu__submenu-link")
for sub_category in sub_categories:
sub_category = str(sub_category.get_attribute("href"))
if(sub_category is not 'http://www.lsbags.co.uk/all-bags/view-all-handbags-en/' and sub_category is not "None"):
browser.get(sub_category)
print("Entered: " + sub_category)
product_titles = browser.find_elements_by_class_name("product-title")
for product_title in product_titles:
final_link = product_title.get_attribute("href")
if(str(final_link) is not "None"):
browser.get(str(final_link))
print("Entered: " + str(final_link))
#DO STUFF
I already tried doing the wait and the wrapper(the try and exception one) solutions from here, but I do not get why its happening, I have an idea why this s happening, because it the browser gets lost right? when it finishes one item?
I don't know how should I express this idea. In my mind I imagine it would be like this:
TIMELINE:
*PAGE 1 is within a loop, ALL THE URLS WITHIN IT IS PROCESSED ONE BY ONE
*The first url of PAGE 1 is caught. Thus do browser.get page turn to PAGE 2
*PAGE 2 has the final list of links I want to evaluate, so another loop here
to get that url, and within that url #DO STUFF
*After #DO STUFF get to the second url of PAGE 2 and #DO STUFF again.
*Let's assume PAGE 2 has only two urls, so it finished looping, so it goes back to PAGE 1
*The second url of PAGE 1 is caught...
and so on... I think I have expressed my idea in some poitn of my code, I dont know what part is not working thus returning the exception.
Any help is appreciated, please help. Thanks!
Problem is that after navigating to the next page but before reaching this page Selenium finds the elements where you are waiting for but this are the elements of the page where you are coming from, after loading the next page this elements are not connected to the Dom anymore but replaced by the ones of the new page but Selenium is going to interact with the elements of the former page wich are no longer attached to the Dom giving a StaleElement exception.
After you pressed on the link for the next page you have to wait till the next page is completly loaded before you start your loop again.
So you have to find something on your page, not being the elements you are going to interact with, that tells you that the next page is loaded.
I am a rookie in python selenium. I have to navigate through all the members from the members page of an institution in Research Gate, which means I have to click the first member to go to their profile page and go back to the members page to click the next member.I tried for loop, but every time it is clicking only on the first member. Could anyone please guide me. Here is what I have tried.
from selenium import webdriver
import urllib
driver = webdriver.Firefox("/usr/local/bin/")
university="Lawrence_Technological_University"
members= driver.get('https://www.researchgate.net/institution/' + university +'/members')
membersList = driver.find_element_by_tag_name("ul")
list = membersList.find_elements_by_tag_name("li")
for member in list:
driver.find_element_by_class_name('display-name').click()
print(driver.current_url)
driver.back()
You are not even doing anything with the list members in your for loop. The state of the page changes after navigating to a different page & coming back, so you need to find the element again. One approach to handle this is given below:
for i in range(len(list)):
membersList = driver.find_element_by_tag_name("ul")
element = membersList.find_elements_by_tag_name("li")[i]
element.click()
driver.back()