I have a program where I am going to the reddit.com website and grabbing an html element from it. however, about 1/10th of the time, the old reddit website shows up, and I have to restart the program. Is there any shorter way to handle this error (basically restart from the top again)? I couldn't seem to figure it out with a try/except.
browser = webdriver.Chrome(executable_path=r'C:\Users\jacka\Downloads\chromedriver_win32\chromedriver.exe')
browser.get("https://www.reddit.com/")
# grabs the html tag for the subreddit name
elem = browser.find_elements_by_css_selector("a[data-click-id='subreddit']")
#in the case that old reddit loads, it restarts the browser.
if len(elem) == 0:
browser.close()
browser = webdriver.Chrome(executable_path=r'C:\Users\jacka\Downloads\chromedriver_win32\chromedriver.exe')
browser.get("https://www.reddit.com/")
# grabs the html tag for the subreddit name
elem = browser.find_elements_by_css_selector("a[data-click-id='subreddit']")
Like #HSK has mentioned in the comment, you can use an infinite while loop to keep trying until you get what you want without an exception. Do add a finally clause to close the browser handle before leaving.
while True:
browser = webdriver.Chrome(executable_path=r'C:\Users\jacka\Downloads\chromedriver_win32\chromedriver.exe')
try:
browser.get("https://www.reddit.com/")
elem = browser.find_elements_by_css_selector("a[data-click-id='subreddit']")
break
except Exception:
pass
finally:
browser.close()
Solved thanks to #HSK. I put the code in a while loop that ran until it got the right version of reddit.
#had to initalize elem so the loop would run
elem = ""
while len(elem) == 0:
browser = webdriver.Chrome(executable_path=r'C:\Users\jacka\Downloads\chromedriver_win32\chromedriver.exe')
browser.get("https://www.reddit.com/")
# grabs the html tag for the subreddit name
elem = browser.find_elements_by_css_selector("a[data-click-id='subreddit']")
Related
I have a script that reads the links that are in excel file, it opens each link fine, but when it encounters an error in automatic it marks error and closes the script. How can I make the script continue to open the links without stopping on the page that has an error?
I accept any suggestion or improvement for script in general. It's just a part of my code.
for url in mylist:
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(val)
timestamp = datetime.datetime.now().strftime('%d_%m_%Y')
driver.save_screenshot(str(shark)+"/"+str(cont)+'_'+timestamp+'.png')
cont += 1
Wrap up your code block in a try-except{} block as follows:
for url in mylist:
try:
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(val)
timestamp = datetime.datetime.now().strftime('%d_%m_%Y')
driver.save_screenshot(str(shark)+"/"+str(cont)+'_'+timestamp+'.png')
cont += 1
except:
continue
I'm just working on a simple script that goes through real estate listings and collects agents personal websites. I ran into an issue when I came across a listing where the agent didn't have a website in which case the script stopped working. Now i've put in a try-except which works until the except block is run, when that happens the whole browser closes.
time.sleep(15)
for i in range(1,9):
listing_page = driver.find_element_by_xpath('//*[#id="m_property_lst_cnt_realtor_more_'+str(i)+'"]').click()
try:
realtor_url = driver.find_element_by_xpath('//*[#id="lblMediaLinks"]/a').click()
WebDriverWait(driver, 10).until(lambda d: len(driver.window_handles) == 2)
driver.switch_to_window(driver.window_handles[1])
WebDriverWait(driver, 10)
except:
print("Not found")
continue
driver.close()
driver.switch_to_window(driver.window_handles[0])
driver.get(home_page)
time.sleep(10)
Is there anyway I can revert back to the home page and start the loop again when the except block is run? Then if it's not; the loop runs as usual?
In my opinion, using exceptions as logic flow is not a good practice. Exceptions should be exceptional... they should indicate when something unexpectedly wrong happened.
Instead, use find_elements_* (plural) and check if the collection returned is empty. If it's not empty, that means that the link exists, click on it, etc. If the collection is empty, return to home and start the next loop.
for i in range(1,9):
listing_page = driver.find_element_by_xpath('//*[#id="m_property_lst_cnt_realtor_more_'+str(i)+'"]').click()
realtor_url = driver.find_elements_by_xpath('//*[#id="lblMediaLinks"]/a')
if (len(realtor_url)) > 0
realtor_url[0].click
WebDriverWait(driver, 10).until(lambda d: len(driver.window_handles) == 2)
driver.switch_to_window(driver.window_handles[1])
driver.close()
driver.switch_to_window(driver.window_handles[0])
driver.get(home_page)
BTW, .click() still doesn't return anything so assigning its return to a variable will never return anything but null so I removed that part.
I am currently writing a python selenium script to take information of a website.
I have successfully got the data of page 1/100+ in the format I want. I unfortunately can’t get the program to run and collect all the information off the proceeding pages.
When I look at the web site target script, it shows me that the “Next” button is compiled like the below;
/body/div[#id='main-content']/div[#class='t6a-grid']/div[#class='mmargin-bottom-30']/div[#id='grid']/div[#class='row-margin-bottom-10']/div[#class='col-md-12 padding-left-0 padding-right-20']/ul[#class='pagination']/li[11]/a
Part of the script I have written is below. The "# this is navigate to next page element" in the script is the area that isn’t currently working.
def get_links(driver, target):
# this is to collect links that associate with all the profiles present in Freshfields website
driver.get(target)
# get links associated to profiles on result page
list_links = []
while True:
list_ppl_link = driver.find_elements_by_xpath('//div[#class=" mix item col-xs-6 col-sm-4"]')
for item in list_ppl_link:
emp_name_obj = item.find_element_by_tag_name('a')
emp_name = emp_name_obj.text
emp_link = emp_name_obj.get_attribute('href')
list_links.append({'emp_name':emp_name, 'emp_link':emp_link})
try:
# this is navigate to next page
driver.find_element_by_xpath('//ul[#class="pagination"]/li').click()
time.sleep(1)
except NoSuchElementException:
break
return list_links
Please could somebody help me to understand how I can loop through the pages and collect the 1,960 records?
try using something like below:
list_ppl_link = driver.find_elements_by_xpath('//div[#class=" mix item col-xs-6 col-sm-4"]')
i=1
for item in list_ppl_link:
i=i+1
emp_name_obj = item.find_element_by_tag_name('a')
emp_name = emp_name_obj.text
emp_link = emp_name_obj.get_attribute('href')
list_links.append({'emp_name':emp_name, 'emp_link':emp_link})
try:
# this is navigate to next page
driver.find_element_by_xpath('//ul[#class="pagination"]//li/a[contains(text(),"' + str(i) +'")').click()
time.sleep(1)
except NoSuchElementException:
break
I am working on my first scraper and ran into an issue. My scraper accesses a website and saves links from the each result page. Now, I only want it to go through 10 pages. The problem comes when the search results has less than 10 pages. I tried using a while loop along with a try statement, but it does not seem to work. After the scraper goes through the first page of results, it does not return any links on the successive pages; however, it does not give me an error and stops once it reaches 10 pages or the exception.
Here is a snippet of my code:
links = []
page = 1
while(page <= 10):
try:
# Get information from the propertyInfo class
properties = WebDriverWait(driver, 10).until(lambda driver: driver.find_elements_by_xpath('//div[#class = "propertyInfo item"]'))
# For each listing
for p in properties:
# Find all elements with a tags
tmp_link = p.find_elements_by_xpath('.//a')
# Get the link from the second element to avoid error
links.append(tmp_link[1].get_attribute('href'))
page += 1
WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
except ElementNotVisibleException:
break
I really appreciate any pointers on how to fix this issue.
You are explicitely catching ElementNotVisibleException exception and stopping on it. This way you won't see any error message. The error is probably in this line:
WebDriverWait(driver, 10).until(lambda driver:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
I assume lambda here should be a test, which is run until succeeded. So it shouldn't make any action like click. I actually believe that you don't need to wait here at all, page should be already fully loaded so you can just click on the link:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
This will either pass to next page (and WebDriverWait at the start of the loop will wait for it) or raise exception if no next link is found.
Also, you better minimize try ... except scope, this way you won't capture something unintentionally. E.g. here you only want to surround next link finding code not the whole loop body:
# ...
while(page <= 10):
# Scrape this page
properties = WebDriverWait(driver, 10).until(...)
for p in properties:
# ...
page += 1
# Try to pass to next page
try:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
except ElementNotVisibleException:
# Break if no next link is found
break
I'm trying to take some information from an HTML element using Selenium - Python, and I'm unsure on how to save it. I'm kind of new to programming, but literate enough to where I know how to write code, but it's hard to research answers and adapt those to my code. I've looked on Google and can't seem to find anything that would help me specifically with what I need.
Here is the HTML element I need to get information from:
<span id="ctl00_plnMain_rptAssigmnetsByCourse_ctl00_lblOverallAverage">99.05</span>
I need to retrieve the 99.05 and store it in a variable named "avg."
Here is my code I have for the Selenium test.
username = raw_input("Username: ")
password = raw_input("Password: ")
browser = webdriver.Firefox() # Get local session of firefox
browser.get("https://hac.mckinneyisd.net/homeaccess/default.aspx") # Load page
elem = browser.find_element_by_name("ctl00$plnMain$txtLogin") # Find the query box
elem.send_keys(username)
elem = browser.find_element_by_name("ctl00$plnMain$txtPassword") # Find the password box
elem.send_keys(password + Keys.RETURN)
time.sleep(0.2) # Let the page load
elem = browser.find_element_by_link_text("Classwork").click()
time.sleep(0.2)
???????????????
browser.close()
What should I put in the ???... to take the 99.05 from the object and save it as "avg?" I have tried:
content = elem.text("td[#id='ctl00....lblOverallAverage']"
...but I get an error saying that I can't do that because it has no type.
Try:
elem = browser.find_element_by_id("ctl00_plnMain_rptAssigmnetsByCourse_ctl00_lblOverallAverage")
avg = elem.getText()