I'm trying to download all the pdf on a webpage using Selenium Python with Chrome as browser but every time the session ends with this message:
StaleElementReferenceException: stale element reference: element is not attached to the page document
(Session info: chrome=52.0.2743.116)
(Driver info: chromedriver=2.22.397933
This is the code:
def download_pdf(self):
current = self.driver.current_url
lista_link_temp = self.driver.find_elements_by_xpath("//*[#href]")
for link in lista_link_temp:
if "pdf+html" in str(link.get_attribute("href")):
tutor = link.get_attribute("href")
self.driver.get(str(tutor))
self.driver.get(current)
Please help me.. I've just tried lambda, implicit and explicit wait
Thanks
As soon as you call self.driver.get() in your loop, all the other elements in the list of elements will become stale. Try collecting the href attributes from the elements first, and then visiting them:
def download_pdf(self):
current = self.driver.current_url
lista_link_temp = self.driver.find_elements_by_xpath("//*[#href]")
pdf_hrefs = []
# You could do this part with a single line list comprehension too, but would be really long...
for link in lista_link_temp:
href = str(link.get_attribute("href"))
if "pdf+html" in href:
pdf_hrefs.append(href)
for h in pdf_hrefs:
self.driver.get(h)
self.driver.get(current)
You get stale element when you search for an element and before doing any action on it the page has changed/reloaded.
Make sure the page is fully loaded before doing any actions in the page.
So you need to add first a condition to wait for the page to be loaded an maybe check all requests are done.
Related
I am trying to scrape a website and have written up a working script. The problem is that after some time running the script I get the stale element reference exception telling me the referenced element (the href) was not found.
Here I am extracting the links of all products on each page in a website and saving them in a list which I later use to extract the data from each link.
for a in tqdm(range(1,pages+1)):
time.sleep(3)
link=driver.find_elements_by_xpath('//div[#class="col-xs-4 animation"]/a')
for b in link:
x = b.get_attribute("href")
print(x)
LINKS.append(x)
time.sleep(3)
#next page
try:
WebDriverWait(driver, delay).until(ec.presence_of_element_located((By.XPATH, '//ul[#class="pagination-sm pagination"]')))
next_page = driver.find_element_by_xpath('.//li[#class="prev"]')
driver.execute_script("arguments[0].click()", next_page)
except NoSuchElementException:
pass
Any idea on how to fix this? The error occurs randomly. Sometimes it finds the links and sometimes it does not, confusing me. Only when I scrape for a long time does this error occur.
I need to find and store the location of some elements so the bot can click on those elements even the if page changes. I have read online that for a single element storing the location of that element in a variable can help however I could not find a way to store locations of multiple elements in python. Here is my code
comment_button = driver.find_elements_by_css_selector("svg[aria-label='Comment']")
for element in comment_button:
comment_location = element.location
sleep(2)
for element in comment_location:
element.click()
this code gives out this error:
line 44, in <module>
element.click()
AttributeError: 'str' object has no attribute 'click'
Is there a way to do this so when the page refreshes the script can store the locations and move on to the next location to execute element.click without any errors?
I have tried implementing ActionChains into my code
comment_button = driver.find_elements_by_css_selector("svg[aria-label='Comment']")
for element in comment_button:
ac = ActionChains(driver)
element.click()
ac.move_to_element(element).move_by_offset(0, 0).click().perform()
sleep(2)
comment_button = driver.find_element_by_css_selector("svg[aria-label='Comment']")
comment_button.click()
sleep(2)
comment_box = driver.find_element_by_css_selector("textarea[aria-label='Add a comment…']")
comment_box.click()
comment_box = driver.find_element_by_css_selector("textarea[aria-label='Add a comment…']")
comment_box.send_keys("xxxx")
post_button = driver.find_element_by_xpath("//button[#type='submit']")
post_button.click()
sleep(2)
driver.back()
scroll()
However this method gives out the same error saying that the page was refreshed and the object can not be found.
selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <svg class="_8-yf5 "> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
Edited:
Assuming No. of such elements are not changing after refresh of page, you can use below code
commentbtns = driver.find_elements_by_css_selector("svg[aria-label='Comment']")
for n in range(1, len(commentbtns)+1):
Path = "(//*[name()='svg'])["+str(n)+"]"
time.sleep(2)
driver.find_element_by_xpath(Path).click()
You can use more sophisticated ways to wait for element to load properly. However for simplicity purpose i have used time.sleep.
connections=driver.find_elements_by_css_selector("a span[class='mn-connection-card__name t-16 t-black t-bold']")
print(len(connections))
for connection in connections:
if connection.text == "XXX":
connection.click()
break
I am getting the following error in the if statement:
stale element reference: element is not attached to the page document
Stale element exception happen when properties of element on which your script is trying to perform some operation is changed. If you want to click a span with text "XXX" you can directly click on that:
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//a[span[text()='XXX']]")))
If your requirement is to loop trough all such elements then:
connections=driver.find_elements_by_css_selector("a span[class='mn-connection-card__name t-16 t-black t-bold']")
print(len(connections))
for i in range(len(connections)):
connections=driver.find_elements_by_css_selector("a span[class='mn-connection-card__name t-16 t-black t-bold']") #Created Fresh element list, so it wont be stale
if connections[i].text == "XXX"
connections[i].click
break
I am trying to go through each products in my catalogue and print product image links. Following is my code.
product_links = driver.find_elements_by_css_selector(".product-link")
for link in product_links:
driver.get(link.get_attribute("href"))
images = driver.find_elements_by_css_selector("#gallery img")
for image in images:
print(image.get_attribute("src"))
driver.back()
But I receiving the error selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document, I think this is happening because when we go back to catalogue page the page get loaded again and the element references in product_links became stale.
How we can avoid this issue? is there any better solution for this?
I ran into a similar problem, and here's how I solved it. Basically, you have to refresh the page and re-establish the list of links each time you return to the page.Of course, doing this you can't use a for loop, because your objects are stale each time.
Unfortunately I can't test this, as I don't have access to your actual URL, but this should be close
def get_prod_page(link):
driver.get(link.get_attribute("href"))
images = driver.find_elements_by_css_selector("#gallery img")
for image in images:
print(image.get_attribute("src"))
driver.back()
counter=0
link_count= len(driver.find_elements_by_css_selector(".product-link"))
while counter <= link_count:
product_links = driver.find_elements_by_css_selector(".product-link")[counter:]
get_prod_page(product_links[0])
counter+=1
driver.refresh()
I am working on my first scraper and ran into an issue. My scraper accesses a website and saves links from the each result page. Now, I only want it to go through 10 pages. The problem comes when the search results has less than 10 pages. I tried using a while loop along with a try statement, but it does not seem to work. After the scraper goes through the first page of results, it does not return any links on the successive pages; however, it does not give me an error and stops once it reaches 10 pages or the exception.
Here is a snippet of my code:
links = []
page = 1
while(page <= 10):
try:
# Get information from the propertyInfo class
properties = WebDriverWait(driver, 10).until(lambda driver: driver.find_elements_by_xpath('//div[#class = "propertyInfo item"]'))
# For each listing
for p in properties:
# Find all elements with a tags
tmp_link = p.find_elements_by_xpath('.//a')
# Get the link from the second element to avoid error
links.append(tmp_link[1].get_attribute('href'))
page += 1
WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
except ElementNotVisibleException:
break
I really appreciate any pointers on how to fix this issue.
You are explicitely catching ElementNotVisibleException exception and stopping on it. This way you won't see any error message. The error is probably in this line:
WebDriverWait(driver, 10).until(lambda driver:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
I assume lambda here should be a test, which is run until succeeded. So it shouldn't make any action like click. I actually believe that you don't need to wait here at all, page should be already fully loaded so you can just click on the link:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
This will either pass to next page (and WebDriverWait at the start of the loop will wait for it) or raise exception if no next link is found.
Also, you better minimize try ... except scope, this way you won't capture something unintentionally. E.g. here you only want to surround next link finding code not the whole loop body:
# ...
while(page <= 10):
# Scrape this page
properties = WebDriverWait(driver, 10).until(...)
for p in properties:
# ...
page += 1
# Try to pass to next page
try:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
except ElementNotVisibleException:
# Break if no next link is found
break