I am scraping a website with selenium and send an alert, if something specific happens. Generally, my code works fine, but sometimes the website doesn't load the elements or the website has an error message like: "Sorry, something went wrong! Please refresh the page and try again!" Both times, my script waits until elements are loaded, but they don't and then my program doesn't do anything. I usually use requests and Beautifulsoup for web scraping, so I am not that familiar with selenium and I am not sure how to handle these errors, because my code doesn't send an error message and just waits, until the elements load, which will likely never happen. If I manually refresh the page, the program continues to work. My idea would be something like: If it takes more than 10 seconds to load, refresh the page and try again.
My code looks somewhat like this:
def get_data():
data_list = []
while len(data_list) < 3:
try:
data = driver.find_elements_by_class_name('text-color-main-secondary.text-sm.font-bold.text-left')
count = len(data)
data_list.append(data)
driver.implicitly_wait(2)
time.sleep(.05)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
WebDriverWait(driver, 3).until(EC.visibility_of_element_located((By.CLASS_NAME,
'text-color-main-secondary.text-sm.font-bold.text-left'.format(
str(
count + 1)))))
except TimeoutException:
break
text = []
elements = []
for i in range(len(data_list)):
for j in range(len(data_list[i])):
t = data_list[i][j].text
elements.append(data_list[i][j])
for word in t.split():
if '#' in word:
text.append(word)
return text, elements
option = webdriver.ChromeOptions()
option.add_extension('')
path = ''
driver = webdriver.Chrome(executable_path=path, options=option)
driver.get('')
login(passphrase)
driver.switch_to.window(driver.window_handles[0])
while True:
try:
infos, elements = get_data()
data, message = check_data(infos, elements)
if data:
send_alert(message)
time.sleep(600)
driver.refresh()
except Exception as e:
exception_type, exception_object, exception_traceback = sys.exc_info()
line_number = exception_traceback.tb_lineno
print("an exception occured - {}".format(e) + " in line: " + str(line_number))
You can use try and except to overcome this problem. First, let's locate the element with a 10s waiting time if the element is not presented you can refresh the page. here is the basic version of the code
try:
# wait for 10s to load element if it did not load then it will redirect to except block
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME,'text-color-main-secondary.text-sm.font-bold.text-left'.format(str(count + 1)))))
except:
driver.refresh()
# locate the elemnt here again
Related
for loop stop In case of except When using the line :
page.close()
from selenium import webdriver
page = webdriver.Chrome("chromedriver")
page.maximize_window()
def test():
for i in range(10):
page.execute_script("window.open()")
page.switch_to.window(page.window_handles[i + 1])
page.get(f"https://haraj.com.sa/119174396{i}")
try:
Object = page.find_element_by_class_name("contact")
Object.click()
except:
page.close()
print("Not find element ")
test()
If the find element ("contact") click on it, And the page stays open in Browser Tab
If the element ("contact") is not find, the page will be closed And for loop continues
If commented #page.close() the for loop will continue And the page that I want to close will stays open in Browser Tab and print("Not find element ") function will be executed
Are there other ways to close the page that does not contain element ("contact") and continue for loop?
Main problem in your code is you are closing browser and then you want to locate element so it will generate error.
There are two solution below
Solution 1:
from selenium import webdriver
page = webdriver.Chrome("chromedriver")
page.maximize_window()
def test():
for i in range(10):
page.execute_script("window.open()")
page.switch_to.window(page.window_handles[i + 1])
page.get(f"https://haraj.com.sa/119174396{i}")
try:
Object = page.find_element_by_class_name("contact")
Object.click()
except:
page.close()
print("Not find element ")
page = webdriver.Chrome("chromedriver")
page.maximize_window()
test()
When element('contact') is not find Above code close the browser and reopen browser again and continue execution
Solution 2:
from selenium import webdriver
page = webdriver.Chrome("chromedriver")
page.maximize_window()
def test():
for i in range(10):
page.execute_script("window.open()")
page.switch_to.window(page.window_handles[i + 1])
page.get(f"https://haraj.com.sa/119174396{i}")
try:
Object = page.find_element_by_class_name("contact")
Object.click()
except:
print("Not find element ")
test()
Above code will not close browser so it will remain same as before exception and continue execution
I am writing a webscraper program that scrapes a list of links from a CSV file. The problem is that some of the pages it visits miss part of the information that the program scrapes because the company doesn't provide them. So if the program is scraping phone numbers and emails and then the phone number is missing it returns an exception and terminates. I need it to skip the missing element and NOT terminate the program so it can scrape the rest of the information with the missing information represented as an empty slot in the CSV file.
This is an example webpage that contains all the information and successfully scrapes - [https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/]
This is an example webpage that is missing both email and number and causes and exception that terminates the program - [https://reality.idnes.cz/rk/detail/narodni-realitni-holding-a-s/5a88aab9e88054474b0eca61/]
I have tried using "try - except(pass)" to fight this so it would skip the exception and continue running the program to my understanding but the program just skips the exception all together and goes to the beginning of the loop and completely skips the information which doesn't get saved to the CSV file.
This is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.chrome.options import Options
import time
import csv
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
with open('ScrapedContent.csv', 'w+', newline='') as write:
writer = csv.writer(write)
options = Options()
options.add_argument('--no-sandbox')
path = "/home/Projects/SRealityContentScraper/chromedriver"
driver = webdriver.Chrome(path)
wait = WebDriverWait(driver, 10)
for link in link_list:
driver.get(', '.join(link))
time.sleep(2)
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = driver.find_element_by_css_selector("span.btn__text")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
addresses = driver.find_element_by_css_selector("p.font-sm")
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
except Exception:
pass
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
email = driver.find_element_by_css_selector("a.item-icon")
except Exception:
pass
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
driver.quit()
To my understand "except - continue" should repeat the loop from the beginning and "except - pass" should just ignore the exception and continue running the program normally which doesn't happen. How can I prevent data loss so it would save the information and left out the missing information? Thanks for any help with this I have been trying to figure this out for hours now!
As mentioned in the comments your problem is that your elements are defined within the try statement.
When the try fails, they are not created and further in the loop you cannot access the .text.
As an example, look at this simple code where an object will always fail:
driver.get("http://www.google.com")
try:
someElement = driver.find_element_by_name('I will Fail!')
except Exception:
pass
print (someElement.text)
This throws this error on the print:
Exception has occurred: NameError name 'someElement' is not defined
One solution is to create a string variable before the try - set this to blank or whatever default value you want. In the try, set that to text from the element.
If it cannot be found, the try bails and the text value remains as its default.
The code from above becomes this:
someElement_Text = "" # use a string like "" or "none" or "blank" or "not found"
try:
someElement = driver.find_element_by_name('I will Fail!')
someElement_Text = someElement.text
except Exception:
pass
print (someElement_Text)
This code does not error.
You'll want to change the latter half of your loop to this:
phone_number_text = ""
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
phone_number_text = phone_number.text
except Exception:
pass
email_text = ""
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
email = driver.find_element_by_css_selector("a.item-icon")
email_text = email.text
except Exception:
pass
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number_text, " ", email_text)
writer.writerow([title.text, offers.text, addresses.text, phone_number_text, email_text])
I have made a Proxy Checker in python in combination with selenium so everytime its opening the selenium browser it uses a different proxy.. But not all the proxies work and I'm stuck with loading the page forever if the proxy is slow.. So strings as a key don't work because the page doesn't get loaded. Is there a function in Python to let me do something like when the page is not fully loaded in 10 seconds it should go to the next proxy? Thanks in advance!
My code so far:
# PROXY SETUP FOR THIS PROGRAM
def fly_setup(fly_url):
fly_options = webdriver.ChromeOptions()
fly_options.add_experimental_option("prefs", {
"profile.default_content_setting_values.notifications": 1
})
with open("proxies.txt") as fly_proxies:
lines = fly_proxies.readlines()
counter = 0
for proxy in lines:
fly_options.add_argument('--proxy-server=%s' % proxy.rstrip())
ad_chrome = webdriver.Chrome(options=fly_options)
ad_chrome.get(fly_url)
ad_source = ad_chrome.page_source
key = 'Vind ik leuk'
time.sleep(10)
if ad_chrome.set_page_load_timeout(10):
print("Page load took to long.. Going to next proxy ")
else:
if key not in ad_source:
print("Proxy not working! Going to next one ...")
ad_chrome.quit()
time.sleep(3)
else:
time.sleep(10)
ad_chrome.find_element_by_xpath('//*[#id="skip_bu2tton"]').click()
counter += 1
print("Total views : " + str(counter))
print("")
ad_chrome.quit()
time.sleep(3)
You can set a timeout limit using set_page_load_timeout like
driver.set_page_load_timeout(10)
If the page cannot be loaded within 10 seconds, then it will throw TimeoutException doc here, catch it and then switch to your next proxy.
In your code, if I assume lines contains all proxies, you can do something like this:
for proxy in lines:
fly_options.add_argument('--proxy-server=%s' % proxy.rstrip())
ad_chrome = webdriver.Chrome(options=fly_options)
ad_chrome.set_page_load_timeout(10)
try:
ad_chrome.get(fly_url)
except TimeoutException:
continue
This solution doesn't always work, especially when the page loads data using AJAX calls. In this case, bet on selenium's waits, wait for something that is only presented/clickable when the whole page finishes loading, then same idea, catch TimeoutException and continue your loop.
I want to play videos automatically through this page by click the Next bottom. However, at the end of each chapter, there is an exercise page without a video, and I want to skip it.
The skip-to-next-chapter button element is on every page, just not visible.
(1) on exercise page, wait for the page to be loaded
(2) find the skip-to-next-chapter button and click on it
(3) on the video page, skip-to-next-chapter is not visible, so skip this block
However, I can not catch any exceptions, so the process got stuck at the next_ = driver.find_element_by_xpath('//*[foo]') line. This line doesn’t return anything and run forever. And it won’t throw a Timeout exception.
How can I debug this?
try:
myElem = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, 'myID')))
next_ = driver.find_element_by_xpath('//*[foo]')
next_.click()
except (NoSuchElementException ,ElementNotVisibleException,TimeoutException):
print('skip this')
changed to
try:
WebDriverWait(driver, 1).until(
EC.element_to_be_clickable((By.XPATH, '//*[contains(concat( " ", #class, " " ), concat( " ", "skip-to-next-chapter", " " ))]'))
).click()
except TimeoutException:
pass
But it still does not work.
Debug final stop point from PyCharm:
Screenshot
When stepping into EC.element_to_be_clickable((By.XPATH, '//*[contains(concat( " ", #class, " " ), concat( " ", "skip-to-next-chapter", " " ))]')) line, it goes to wait.py >>
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)# <<<< stopped here!!
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)
You have to take care of a couple of things in your code block. In your code block as you have tried to handle three exceptions and among them NoSuchElementException and ElementNotVisibleException looks as a pure overhead to me for the following reasons:
First of all, I am still trying to understand the logic behind waiting for an elementA (i.e. (By.ID, 'myID')), but moving ahead and clicking on elementB, i.e., find_element_by_xpath('//*[foo]')
If your code block is generating NoSuchElementException, definitely we have to look at the Locator Strategy which you have adapted if it uniquely identifies an element and also cross-check that the element is within the Viewport.
If your code block is generating ElementNotVisibleException, we have to consider this factor as well when we pick up the EC clause, e.g., presence_of_element_located.
Finally, as moving forward you are attempting to invoke the click() method on the element, instead of the EC clause as presence_of_element_located, you should be using element_to_be_clickable(locator)
So to wait for an element and moving ahead to click it, your code block will be like:
try:
WebDriverWait(driver, delay).until(EC.element_to_be_clickable((By.ID, 'myID'))).click()
except (TimeoutException):
print('skip this')
I still don't know what’s wrong with my code. Why doesn’t WebDriver return anything when it can not find the element? Anyway, I walk way from this by another way.
Use Beautiful Soup to parse the page source
Check if the button exists
if exist → driver, click it
if not → skip
src = driver.page_source
soup = BeautifulSoup(src, 'lxml')
next_chap = soup.find('button',class_="btn btn-link skip-to-next-chapter ga")
if(next_chap!=None):
try:
driver.find_element_by_css_selector('.btn.btn-link.skip-to-next-chapter.ga').click()
except Exception as e:
print(e)
else:
print("button not exists ,skip")
I am newbie to Selenium Python. I am trying to fetch the profile URLs which will be 10 per page. Without using while, I am able to fetch all 10 URLs but for only the first page alone. When I use while, it iterates, but fetches only 3 or 4 URLs per page.
I need to fetch all the 10 links and keep iterating through pages. I think, I must do something with StaleElementReferenceException
Kindly help me solve this problem.
Given the code below.
def test_connect_fetch_profiles(self):
driver = self.driver
search_data = driver.find_element_by_id("main-search-box")
search_data.clear()
search_data.send_keys("Selenium Python")
search_submit = driver.find_element_by_name("search")
search_submit.click()
noprofile = driver.find_elements_by_xpath("//*[text() = 'Sorry, no results containing all your search terms were found.']")
self.assertFalse(noprofile)
while True:
wait = WebDriverWait(driver, 150)
try:
profile_links = wait.until(EC.presence_of_all_elements_located((By.XPATH,"//*[contains(#href,'www.linkedin.com/profile/view?id=')][text()='LinkedIn Member'or contains(#href,'Type=NAME_SEARCH')][contains(#class,'main-headline')]")))
for each_link in profile_links:
page_links = each_link.get_attribute('href')
print(page_links)
driver.implicitly_wait(15)
appendFile = open("C:\\Users\\jayaramb\\Documents\\profile-links.csv", 'a')
appendFile.write(page_links + "\n")
appendFile.close()
driver.implicitly_wait(15)
next = wait.until(EC.visibility_of(driver.find_element_by_partial_link_text("Next")))
if next.is_displayed():
next.click()
else:
print("End of Page")
break
except ValueError:
print("It seems no values to fetch")
except NoSuchElementException:
print("No Elements to Fetch")
except StaleElementReferenceException:
print("No Change in Element Location")
else:
break
Please let me know if there are any other effective ways to fetch the required profile URL and keep iterating through pages.
I created a similar setup which works alright for me. I've had some problems with selenium trying to click on the next-button but it throwing a WebDriverException instead, likely because the next-button is not in view. Hence, instead of clicking the next-button I get its href-attribute and load the new page up with driver.get() and thus avoiding an actual click making the test more stable.
def test_fetch_google_links():
links = []
# Setup driver
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.maximize_window()
# Visit google
driver.get("https://www.google.com")
# Enter search query
search_data = driver.find_element_by_name("q")
search_data.send_keys("test")
# Submit search query
search_button = driver.find_element_by_xpath("//button[#type='submit']")
search_button.click()
while True:
# Find and collect all anchors
anchors = driver.find_elements_by_xpath("//h3//a")
links += [a.get_attribute("href") for a in anchors]
try:
# Find the next page button
next_button = driver.find_element_by_xpath("//a[#id='pnnext']")
location = next_button.get_attribute("href")
driver.get(location)
except NoSuchElementException:
break
# Do something with the links
for l in links:
print l
print "Found {} links".format(len(links))
driver.quit()