Stale element exception python - python

I am retrieving my link using the following code:
os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:\\Program Files\\Mozilla Firefox\\firefox.exe', log_file=sys.stdout)
self.driver = webdriver.Firefox(firefox_binary=binary, executable_path='C:/chromedriver/geckodriver')
self.driver.get(link)
Next, I call:
xpath=".//a[#class='tileLink']"
ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
your_element = WebDriverWait(self.driver, 30, ignored_exceptions=ignored_exceptions).until(
expected_conditions.presence_of_element_located((By.XPATH, xpath)))
and then
links = self.driver.find_elements_by_xpath(".//a[#class='tileLink']")
for link in links:
href_ = link.get_attribute("href") # <<-- Error ehre
and link.get_attribute(attribute) throws the stale element exception.
Now, given the WebDriverWait I thought I would avoid this issue, yet it persists.
I am tempted to take the page source, once it has loaded, and throw it into lxml to avoid this issue completely.
The time the passes between establishing links and iterating over the links is a second at most.
Has anyone else experienced an issue like this, and found a solution?
Any guidance is appreciated.

If I have understood your question you need to get the href attribute from all the elements identified with the xpath //a[#class='tileLink']. To achieve that you can use the following code block :
# xpath=".//a[#class='tileLink']"
# ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
# links = WebDriverWait(self.driver, 30, ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_element_located((By.XPATH, xpath)))
links = WebDriverWait(self.driver, 30).until(expected_conditions.visibility_of_all_elements_located((By.XPATH, "//a[#class='tileLink']")))
for link in links:
print(link.get_attribute("href"))

I had a similar problem with some movable buttons on the page being stale.
How about something like:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.action_chains import ActionChains
hrefs = []
for index, link in enumerate(links):
attempts = 0
while True:
try:
for action in range(0, 10):
ActionChains(context.browser). \
move_to_element(links[index]).click().perform()
href_ = link.get_attribute("href")
if href_:
hrefs.append(href_)
break
break
except StaleElementReferenceException:
attempts += 1
if attempts > 10:
break
I realise this is a very crude (primitive, even) solution, and assumes that the element becoming "un-stale" is a timing issue.
I'm also not that good with Python, so this may need some tweaking...
Thinking about it, since those elements are links, perhaps you don't want to click them, in which case delete the click() bit from the ActionChains line, or perhaps change it to context_click().

Related

How can I check if an element exists on a page using Selenium XPath?

I'm writing a script in to do some webscraping on my Firebase for a few select users. After accessing the events page for a user, I want to check for the condition that no events have been logged by that user first.
For this, I am using Selenium and Python. Using XPath seems to work fine for locating links and navigation in all other parts of the script, except for accessing elements in a table. At first, I thought I might have been using the wrong XPath expression, so I copied the path directly from Chrome's inspection window, but still no luck.
As an alternative, I have tried to copy the page source and pass it into Beautiful Soup, and then parse it there to check for the element. No luck there either.
Here's some of the code, and some of the HTML I'm trying to parse. Where am I going wrong?
# Using WebDriver - always triggers an exception
def check_if_user_has_any_data():
try:
time.sleep(10)
element = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[#id="event-table"]/div/div/div[2]/mobile-table/md-whiteframe/div[1]/ga-no-data-table/div')))
print(type(element))
if element == True:
print("Found empty state by copying XPath expression directly. It is a bit risky, but it seems to have worked")
else:
print("didn’t find empty state")
except:
print("could not find the empty state element", EC)
# Using Beautiful Soup
def check_if_user_has_any_data#2():
time.sleep(10)
html = driver.execute_script("return document.documentElement.outerHTML")
soup = BeautifulSoup(html, 'html.parser')
print(soup.text[:500])
print(len(soup.findAll('div', {"class": "table-row-no-data ng-scope"})))
HTML
<div class="table-row-no-data ng-scope" ng-if="::config" ng-class="{overlay: config.isBuilderOpen()}">
<div class="no-data-content layout-align-center-center layout-row" layout="row" layout-align="center center">
<!-- ... -->
</div>
The first version triggers the exception and is expected to evaluate 'element' as True. Actual, the element is not found.
The second version prints the first 500 characters (correctly, as far as I can tell), but it returns '0'. It is expected to return '1' after inspecting the page source.
Use the following code:
elements = driver.find_elements_by_xpath("//*[#id='event-table']/div/div/div[2]/mobile-table/md-whiteframe/div[1]/ga-no-data-table/div")
size = len(elements)
if len(elements) > 0:
# Element is present. Do your action
else:
# Element is not present. Do alternative action
Note: find_elements will not generate or throw any exception
Here is the method that generally I use.
Imports
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
Method
def is_element_present(self, how, what):
try:
self.driver.find_element(by=how, value=what)
except NoSuchElementException as e:
return False
return True
Some things load dynamically. It is better to just set a timeout on a wait exception.
If you're using Python and Selenium, you can use this:
try:
driver.find_element_by_xpath("<Full XPath expression>") # Test the element if exist
# <Other code>
except:
# <Run these if element doesn't exist>
I've solved it. The page had a bunch of different iframe elements, and I didn't know that one had to switch between frames in Selenium to access those elements.
There was nothing wrong with the initial code, or the suggested solutions which also worked fine when I tested them.
Here's the code I used to test it:
# Time for the page to load
time.sleep(20)
# Find all iframes
iframes = driver.find_elements_by_tag_name("iframe")
# From inspecting page source, it looks like the index for the relevant iframe is [0]
x = len(iframes)
print("Found ", x, " iFrames") # Should return 5
driver.switch_to.frame(iframes[0])
print("switched to frame [0]")
if WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[#class="no-data-title ng-binding"]'))):
print("Found it in this frame!")
Check the length of the element you are retrieving with an if statement,
Example:
element = ('https://www.example.com').
if len(element) > 1:
# Do something.

Continue when element is not found in selenium python

The following script follows a page in Instagram:
browser = webdriver.Chrome('./chromedriver')
# GO INSTAGRAM PAGE FOR LOGIN
browser.get('https://www.instagram.com/accounts/login/?hl=it')
sleep(2)
# ID AND PASSWORD
elem = browser.find_element_by_name("username").send_keys('test')
elem = browser.find_element_by_name("password").send_keys('passw')
# CLICK BUTTON AND OPEN INSTAGRAM
sleep(5)
good_elem = browser.find_element_by_xpath('//*[#id="react-root"]/section/main/div/article/div/div[1]/div/form/span/button').click()
sleep(5)
browser.get("https://www.instagram.com")
# GO TO PAGE FOR FOLLOW
browser.get("https://www.instagram.com/iam.ai4/")
sleep(28)
segui = browser.find_element_by_class_name('BY3EC').click()
If an element with class BY3EC isn't found I want the script to keep working.
When an element is not found it throws NoSuchElementException, so you can use try/except to avoid that, for example:
from selenium.common.exceptions import NoSuchElementException
try:
segui = browser.find_element_by_class_name('BY3EC').click()
except NoSuchElementException:
print('Element BY3EC not found') # or do something else here
You can take a look at selenium exceptions to get an idea of what each one of them is for.
surround it with try catches, than you can build a happy path and handle failures as well, so your test case will always work
Best practice is to not use Exceptions to control flow. Exceptions should be exceptional... rare and unexpected. The simple way to do this is to get a collection using the locator and then see if the collection is empty. If it is, you know the element doesn't exist.
In the example below we search the page for the element you wanted and check to see that the collection contains an element, if it does... click it.
segui = browser.find_elements_by_class_name('BY3EC')
if segui:
segui[0].click()

Youtube scraping with selenium :not getting all comments

I am trying to scrape youtube comments using selenium with python. Below is the code which scrapes just the one comment and throws error
driver = webdriver.Chrome()
url="https://www.youtube.com/watch?v=MNltVQqJhRE"
driver.get(url)
wait(driver, 5500)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight + 500);")
driver.implicitly_wait(5000)
#content = driver.find_element_by_xpath('//*[#id="contents"]')
comm=driver.find_element_by_xpath('//div[#class="style-scope ytd-item-section-renderer"]')
comm1=comm.find_elements_by_xpath('//yt-formatted-string[#id="content-text"]')
#print(comm.text)
for i in range(50):
print(comm1[i].text,end=' ')
This is the output I am getting. How do I get all the comments on that page??? Can anyone help me with this.
Being a sucessful phyton freelancer really mean to me because if I able to make $2000 in month I can really help my family financial, improve my skill, and have a lot of time to refreshing. So thanks Qazi, you really help me :D
Traceback (most recent call last):
File "C:\Python36\programs\Web scrap\YT_Comm.py", line 19, in <module>
print(comm1[i].text,end=' ')
IndexError: list index out of range
An IndexError means you’re attempting to access a position in a list that doesn’t exist. You’re iterating over your list of elements (comm1) exactly 50 times, but there are fewer than 50 elements in the list, so eventually you attempt to access an index that doesn’t exist.
Superficially, you can solve your problem by changing your iteration to loop over exactly as many elements as exist in your list—no more and no less:
for element in comm1:
print(element.text, end=‘ ‘)
But that leaves you with the problem of why your list has fewer than 50 elements. The video you’re scraping has over 90 comments. Why doesn’t your list have all of them?
If you take a look at the page in your browser, you'll see that the comments load progressively using the infinite scroll technique: when the user scrolls to the bottom of the document, another "page" of comments are fetched and rendered, increasing the length of the document. To load more comments, you will need to trigger this behavior.
But depending on the number of comments, one fetch may not be enough. In order to trigger the fetch and rendering of all of the content, then, you will need to:
attempt to trigger a fetch of additional content, then
determine whether additional content was fetched, and, if so,
repeat (because there might be even more).
Triggering a fetch
We already know that additional content is fetched by scrolling to the bottom of the content container (the element with id #contents), so let's do that:
driver.execute_script(
"window.scrollTo(0, document.querySelector('#contents').scrollHeight);")
(Note: Because the content resides in an absolute-positioned element, document.body.scrollHeight will always be 0 and will not trigger a scroll.)
Waiting for the content container
But as with any browser automation, we're in a race with the application: What if the content container hasn't rendered yet? Our scroll would fail.
Selenium provides WebDriverWait() to help you wait for the application to be in a particular state. It also provides, via its expected_conditions module, a set of common states to wait for, such as the presence of an element. We can use both of these to wait for the content container to be present:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
TIMEOUT_IN_SECONDS = 10
wait = WebDriverWait(driver, TIMEOUT_IN_SECONDS)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#contents")))
Determining whether additional content was fetched
At a high level, we can determine whether additional content was fetched by:
counting the content before we trigger the fetch,
counting the content after we trigger the fetch, then
comparing the two.
Counting the content
Within our container (with id "#contents"), each piece of content has id #content. To count the content, we can simply fetch each of those elements and use Python's built-in len():
count = len(driver.find_elements_by_css_selector("#contents #content")
Handling a slow render
But again, we're in a race with the application: What happens if either the fetch or the render of additional content is slow? We won't immediately see it.
We need to give the web application time to do its thing. To do this, we can use WebDriverWait() with a custom condition:
def get_count():
return len(driver.find_elements_by_css_selector("#contents #content"))
count = get_count()
# ...
wait.until(
lambda _: get_count() > count)
Handling no additional content
But what if there isn't any additional content? Our wait for the count to increase will timeout.
As long as our timeout is high enough to allow sufficient time for the additional content to appear, we can assume that there is no additional content and ignore the timeout:
try:
wait.until(
lambda _: get_count() > count)
except TimeoutException:
# No additional content appeared. Abort our loop.
break
Putting it all together
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
TIMEOUT_IN_SECONDS = 10
wait = WebDriverWait(driver, TIMEOUT_IN_SECONDS)
driver.get(URL)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#contents")))
def get_count():
return len(driver.find_elements_by_css_selector("#contents #content"))
while True:
count = get_count()
driver.execute_script(
"window.scrollTo(0, document.querySelector('#contents').scrollHeight);")
try:
wait.until(
lambda _: get_count() > initial_count)
except TimeoutException:
# No additional content appeared. Abort our loop.
break
elements = driver.find_elements_by_css_selector("#contents #content")
Bonus: Simplifying with capybara-py
With capybara-py, this becomes a bit simpler:
import capybara
from capybara.dsl import page
from capybara.exceptions import ExpectationNotMet
#capybara.register_driver("selenium_chrome")
def init_selenium_chrome_driver(app):
from capybara.selenium.driver import Driver
return Driver(app, browser="chrome")
capybara.current_driver = "selenium_chrome"
capybara.default_max_wait_time = 10
page.visit(URL)
contents = page.find("#contents")
elements = []
while True:
try:
elements = contents.find_all("#content", minimum=len(elements) + 1)
except ExpectationNotMet:
# No additional content appeared. Abort our loop.
break
page.execute_script(
"window.scrollTo(0, arguments[0].scrollHeight);", contents)

Unable to fetch all the necessary links during Iteration - Selenium Python

I am newbie to Selenium Python. I am trying to fetch the profile URLs which will be 10 per page. Without using while, I am able to fetch all 10 URLs but for only the first page alone. When I use while, it iterates, but fetches only 3 or 4 URLs per page.
I need to fetch all the 10 links and keep iterating through pages. I think, I must do something with StaleElementReferenceException
Kindly help me solve this problem.
Given the code below.
def test_connect_fetch_profiles(self):
driver = self.driver
search_data = driver.find_element_by_id("main-search-box")
search_data.clear()
search_data.send_keys("Selenium Python")
search_submit = driver.find_element_by_name("search")
search_submit.click()
noprofile = driver.find_elements_by_xpath("//*[text() = 'Sorry, no results containing all your search terms were found.']")
self.assertFalse(noprofile)
while True:
wait = WebDriverWait(driver, 150)
try:
profile_links = wait.until(EC.presence_of_all_elements_located((By.XPATH,"//*[contains(#href,'www.linkedin.com/profile/view?id=')][text()='LinkedIn Member'or contains(#href,'Type=NAME_SEARCH')][contains(#class,'main-headline')]")))
for each_link in profile_links:
page_links = each_link.get_attribute('href')
print(page_links)
driver.implicitly_wait(15)
appendFile = open("C:\\Users\\jayaramb\\Documents\\profile-links.csv", 'a')
appendFile.write(page_links + "\n")
appendFile.close()
driver.implicitly_wait(15)
next = wait.until(EC.visibility_of(driver.find_element_by_partial_link_text("Next")))
if next.is_displayed():
next.click()
else:
print("End of Page")
break
except ValueError:
print("It seems no values to fetch")
except NoSuchElementException:
print("No Elements to Fetch")
except StaleElementReferenceException:
print("No Change in Element Location")
else:
break
Please let me know if there are any other effective ways to fetch the required profile URL and keep iterating through pages.
I created a similar setup which works alright for me. I've had some problems with selenium trying to click on the next-button but it throwing a WebDriverException instead, likely because the next-button is not in view. Hence, instead of clicking the next-button I get its href-attribute and load the new page up with driver.get() and thus avoiding an actual click making the test more stable.
def test_fetch_google_links():
links = []
# Setup driver
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.maximize_window()
# Visit google
driver.get("https://www.google.com")
# Enter search query
search_data = driver.find_element_by_name("q")
search_data.send_keys("test")
# Submit search query
search_button = driver.find_element_by_xpath("//button[#type='submit']")
search_button.click()
while True:
# Find and collect all anchors
anchors = driver.find_elements_by_xpath("//h3//a")
links += [a.get_attribute("href") for a in anchors]
try:
# Find the next page button
next_button = driver.find_element_by_xpath("//a[#id='pnnext']")
location = next_button.get_attribute("href")
driver.get(location)
except NoSuchElementException:
break
# Do something with the links
for l in links:
print l
print "Found {} links".format(len(links))
driver.quit()

Scraper: Try skips code in while loop (Python)

I am working on my first scraper and ran into an issue. My scraper accesses a website and saves links from the each result page. Now, I only want it to go through 10 pages. The problem comes when the search results has less than 10 pages. I tried using a while loop along with a try statement, but it does not seem to work. After the scraper goes through the first page of results, it does not return any links on the successive pages; however, it does not give me an error and stops once it reaches 10 pages or the exception.
Here is a snippet of my code:
links = []
page = 1
while(page <= 10):
try:
# Get information from the propertyInfo class
properties = WebDriverWait(driver, 10).until(lambda driver: driver.find_elements_by_xpath('//div[#class = "propertyInfo item"]'))
# For each listing
for p in properties:
# Find all elements with a tags
tmp_link = p.find_elements_by_xpath('.//a')
# Get the link from the second element to avoid error
links.append(tmp_link[1].get_attribute('href'))
page += 1
WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
except ElementNotVisibleException:
break
I really appreciate any pointers on how to fix this issue.
You are explicitely catching ElementNotVisibleException exception and stopping on it. This way you won't see any error message. The error is probably in this line:
WebDriverWait(driver, 10).until(lambda driver:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
I assume lambda here should be a test, which is run until succeeded. So it shouldn't make any action like click. I actually believe that you don't need to wait here at all, page should be already fully loaded so you can just click on the link:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
This will either pass to next page (and WebDriverWait at the start of the loop will wait for it) or raise exception if no next link is found.
Also, you better minimize try ... except scope, this way you won't capture something unintentionally. E.g. here you only want to surround next link finding code not the whole loop body:
# ...
while(page <= 10):
# Scrape this page
properties = WebDriverWait(driver, 10).until(...)
for p in properties:
# ...
page += 1
# Try to pass to next page
try:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
except ElementNotVisibleException:
# Break if no next link is found
break

Categories

Resources