Handling selenium exceptions and alertbox in python - python

I am using selenium for some automation work. Sorry for long discription. I am new with Python.
Basically there is the students result portal. where we need to enter a seat number and click on OK button to see the result. On click of the submit new window is opened where the result is displayed using table in html.
If seat number is invalid alertbox is opened indicating invalid seat no and ok option to close.
Problem:
I want to loop through the roll numbers from lets say 1500 to 1600. if 1501 roll number is invalid alertbox is shown. I want to close the alertbox and continue with roll no 1502.
if the value of the result is more than 96% i want to increase the count by 1.
2. Once the result is opened after doing calculation I want to close the newly opened window and enter the next seat number again. and continue with calculation
This is my code:
from selenium import webdriver
from selenium.common.exceptions import UnexpectedAlertPresentException
from selenium.webdriver.common.alert import Alert
import time
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
web = webdriver.Chrome(options=options,executable_path='J:\stuff\soft\chromedriver.exe')
web.get('https://msbte.org.in/DISRESLIVE2021CRSLDSEP/frmALYSUM21PBDisplay.aspx')
# variable to store the result
resultCount = 0
rlstart = 156857
rlend = 157299
try:
web.implicitly_wait(5)
pdl = web.current_window_handle
for x in range(rlstart, rlend):
web.implicitly_wait(1)
inp = web.find_element_by_xpath('//*[#id="txtEnrollSeatNo"]')
inp.send_keys(x)
submit = web.find_element_by_xpath('//*[#id="btnSubmit"]')
submit.click()
web.implicitly_wait(2)
web.implicitly_wait(2)
# pdl = web.current_window_handle
handles = web.window_handles
for handle in handles:
if(handle != pdl):
switch_to_alert().accept()
web.switch_to.window(handle)
getresult = web.find_element_by_css_selector('body > div > div:nth-child(3) > div:nth-child(4) > table > tbody > tr:nth-child(5) > td:nth-child(3) > strong').text
if(getresult > 96.00):
resultCount += 1
web.close()
web.switch_to.window(pdl)
web.implicitly_wait(2)
except UnexpectedAlertPresentException:
alert_obj = web.switch_to.alert
alert_obj.accept()
finally:
print("end")
web.quit()
print(resultCount)
This is errors

You can go through below code once.
I have not edited your code but it does what you ask for.
while rlstart != rlend+1: , rlend+1 because if there is an increment, 156860 becomes 156861 and when rlstart is 156861, it comes out the while loop and does not give 156861's result.
from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path="path")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://msbte.org.in/DISRESLIVE2021CRSLDSEP/frmALYSUM21PBDisplay.aspx")
rlstart = 156857
rlend = 156860 # Have tested only for a few Seat no.
#rlend = 157299
while rlstart != rlend+1:
try:
driver.find_element_by_id("txtEnrollSeatNo").send_keys(rlstart) # To send the Seat no
driver.find_element_by_id("btnSubmit").click() # To click on the submit button and exemption happens after this.
# Collect all the windows opened-up.
handles = driver.window_handles
# Switch to other window and extract the Seat no and percenatge.
driver.switch_to.window(handles[1])
time.sleep(1)
seatno = driver.find_element_by_xpath("/html/body/div/div[2]/table/tbody/tr[2]/td[6]").text
per = driver.find_element_by_xpath("/html/body/div/div[2]/div[2]/table/tbody/tr[5]/td[3]/strong").text
print("Result of Seat no: {}".format(seatno))
print("Percentage: {}".format(per))
# Since percentage is in decimal but as a string, converting it into float is more accurate. Compare and increment.
if float(per)>96.0:
rlend+=1
print("new rlend: {}".format(rlend))
# Close the new window, switch back to parent window and clear before entering a new Seat no.
driver.close()
driver.switch_to.window(handles[0])
driver.find_element_by_id("txtEnrollSeatNo").clear()
except:
print("Invalid Seat No : {}".format(rlstart))
# Handle the alert, clear the field for next Seat no and continue. No need to switch between windows since no new window has opened up.
driver.switch_to.alert.accept()
driver.find_element_by_id("txtEnrollSeatNo").clear()
pass
rlstart+=1
driver.quit()
Output:
Result of Seat no: 156857
Percentage: 95.71
Result of Seat no: 156858
Percentage: 96.63
new rlend: 156861
Result of Seat no: 156859
Percentage: 86.11
Result of Seat no: 156860
Percentage: 90.29
Result of Seat no: 156861
Percentage: 96.17
new rlend: 156862
Result of Seat no: 156862
Percentage: 75.00

There are several issues with your code.
web.implicitly_wait(1) does not insert actual pause in your code. It just sets timeout. How much time to wait for element to appear on the page. So when you define it twice
web.implicitly_wait(2)
web.implicitly_wait(2)
This doesn't give you a pause of 4 seconds, just defines the timeout for 2 seconds twice but not pausing your program flow.
Also you don't need to define this multiple times, just define it once and forgot about it.
Also we usually define the timeout to be 10-20-30 seconds, not 1-2 seconds. This can cause test failures in case of slow internet connection / slow web site responces etc.
In case of correct seat number no alert appearing but the data is opened in a new window.
So when the seat is correct switch_to_alert().accept() will fail - this is what actually happens since no alert appeared.
I was working on making a correct code, but other people gave you working code. So you can read the explanations here and the working code there :)

Things to noted down :
You should not use implicit waits more than once.
Use explicit waits, or in dead cases use time.sleep(), the below code I have put sleep just for visual purpose.
You are comparing string with a float which is wrong.
There's a way to switch windows, and alert please see below.
Also, having said that, I would not recommend to mix implicit with explicit.
I have reduced the value of rlend, for testing purpose, you will have to increase that and see if that works.
Code :-
web = webdriver.Chrome(driver_path)
web.maximize_window()
#web.implicitly_wait(50)
web.get("https://msbte.org.in/DISRESLIVE2021CRSLDSEP/frmALYSUM21PBDisplay.aspx")
wait = WebDriverWait(web, 20)
resultCount = 0
rlstart = 156857
rlend = 156861
#157299
try:
for x in range(rlstart, rlend):
orginal_window = web.current_window_handle
seat_input_box = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[id='txtEnrollSeatNo']")))
time.sleep(1)
seat_input_box.clear()
seat_input_box.send_keys(rlstart)
rlstart = rlstart + 1
submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[id='btnSubmit']")))
submit.click()
try:
print("Alert was not present, but new windows was")
handles = web.window_handles
web.switch_to.window(handles[1])
time.sleep(1)
web.maximize_window()
time.sleep(2)
web.execute_script("window.scrollTo(0, 300)")
time.sleep(2)
getresult = wait.until(EC.presence_of_element_located((By.XPATH, "//td[contains(text(),'Aggregate Marks : ')]/following-sibling::td[2]/strong"))).text
getresult_dec = float(getresult)
if getresult_dec > 96.00 :
resultCount = resultCount + 1
print("Kill the browser in else block.")
web.close()
else:
web.close()
print("Kill the browser in else block.")
except:
print("Means alert is present. ")
a = web._switch_to.alert
a.accept()
web.switch_to.default_content()
time.sleep(3)
web.switch_to.window(orginal_window)
except:
pass
print(resultCount)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

Related

Selenium is flying through the program and not waiting for data to appear

No matter what kind of waits, EC, sleeps I put in I can't get it to slow down enough to register my inventory.
The problem that happens is the script does the input of 99. Then it skips back to L or large before the inventory can register. So it goes to Large finishes that part then goes to 1X then before it can register the inventory it goes back to Large, then to 2X and again before it can register the inventor it goes back to Large again before it continues to 3x.
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from lxml import html
from time import sleep
import time
import csv
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://www.rosegal.com/plus-size-tops-120/')
driver.maximize_window()
# This is for pagination.
for x in range(1, 60)
# This is for the rows the produt is on.
for r in range(1, 15):
# This is for the product it's on the rows.
for m in range(1,4):
sleep(2)
# This resets the page of products
driver.get(f"https://www.rosegal.com/plus-size-tops-120/{x}.html")
a = ActionChains(driver)
product = driver.find_element_by_xpath(f'/html[1]/body[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[5]/ul[{r}]/li[{m}]/div[1]/div[2]/p[1]/a[1]')
a.move_to_element(product ).click().perform()
sleep(1)
title= driver.find_element_by_xpath("/html[1]/body[1]/div[1]/section[1]/div[1]/div[2]/div[1]/h1[1]").text
price= driver.find_element_by_xpath("//b[contains(#class,'my_shop_price')]").text
# This clicks the decscription button that causes the description list to show.
description= driver.find_element_by_xpath('//*[#id="page"]/section/div/div[2]/div[9]/ul/li[2]').click()
sleep(2)
material= driver.find_element_by_xpath("//div[contains(#class,'xxkkk20')]").text
# This controls the color choice. There could be 1 to 5 colors for a product
for c in range(1, 5):
# Since there could be 1 to 5 colors to chose from this try is a simple way the do the colors.
try:
# This calls up the different colors.
pat1= driver.find_element_by_xpath(f'//*[#id="select-attr-0"]/a[{c}]').click()
color1 = driver.find_element_by_css_selector(".logsss_event_cl.itemAttr.current").get_attribute("data-value")
#link = driver.find_element_by_css_selector(".logsss_event_cl.itemAttr.current>img").get_attribute("src")
print(color1)
# This for s selects the SIZE.
for s in range(1, 8):
# Since there could be 6 to 8 sizes this try checks for them.
try:
# This calls up the different sizes.
Size_button = driver.find_element_by_xpath(f'//*[#id="select-attr-1"]/span[2]/a[{s}]').click()
image4 = driver.find_element_by_xpath('//*[#id="goods_thumb_content"]/ul/li[1]').get_attribute("data-bigimg")
image4 = driver.find_element_by_xpath('//*[#id="goods_thumb_content"]/ul/li[2]').get_attribute("data-bigimg")
image4 = driver.find_element_by_xpath('//*[#id="goods_thumb_content"]/ul/li[3]').get_attribute("data-bigimg")
# Since there can be a forth image, this try takes care of this.
try:
image4 = driver.find_element_by_xpath('//*[#id="goods_thumb_content"]/ul/li[4]').get_attribute("data-bigimg")
except:
pass
size = driver.find_element_by_xpath(f'//body[1]/div[1]/section[1]/div[1]/div[2]/div[7]/span[2]/a[{s}]').text
sku = driver.find_element_by_xpath('//*[#id="js_addToCart"]').get_attribute("data-goods-sn")
####### This where the problem startsdatetime #######
# First you put in a high number in the QTY input box.
input = driver.find_element_by_xpath("//input[contains(#class,'fl num logsss_event_cl')]").send_keys('99')
#Then you hit the Plus sign.
plus = driver.find_element_by_xpath('//*[#id="select-attr-1"]/span[2]/a[1]').click()
sleep(8)
# Then the true inventory shows up.
inventory = driver.find_element_by_xpath("//input[contains(#class,'fl num logsss_event_cl')]"). get_attribute("value")
##### The problem happens is the script does the input of 99. Then it skip back to L or large before the inventory can register
##### So it goes to Large finishes that part then goes to 1X then before it can register the inventpry it's goes back to Large, then to
##### 2X and again before it can register the inventor it goes back to Large again before it continues to 3x.
sleep(8)
print(sku, size, inventory)
except:
pass
except:
pass
I took all the try: and except: out of that section and the sleeps worked. Apparently sleeps and try: don't go together.
Thanks for all your help

How to close a browser tab with Selenium at a given time

I have been working on this script that automatically joins google meets. It logs in to gmail and then goes to the meeting automatically if it is the time for meeting. But, now I am having problems with leaving the meeting after certain time. I want to just close a browser tab, thus the meeting. Then continue checking for the next meeting. I think the last while loop that is intended to close the chome tab after the meeting is done does not run at all. I have tried replacing it with print statements to see if it is executed, but it does not. I do not know why not tho.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import datetime
import time
import signal
now = datetime.datetime.now()
current_time = now.strftime("%H:%M / %A")
justtime = now.strftime("%H:%M")
print (current_time)
def Glogin(mail_address, password):
#os.system("obs --startvirtualcam &")
# Login Page
driver.get(
'https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/&ec=GAZAAQ')
# input Gmail
driver.find_element_by_id("identifierId").send_keys(mail_address)
driver.find_element_by_id("identifierNext").click()
driver.implicitly_wait(10)
# input Password
driver.find_element_by_xpath(
'//*[#id="password"]/div[1]/div/div[1]/input').send_keys(password)
driver.implicitly_wait(10)
driver.find_element_by_id("passwordNext").click()
driver.implicitly_wait(10)
# go to google home page
driver.get('https://google.com/')
driver.implicitly_wait(100)
driver.get(sub)
# turn off Microphone
time.sleep(1)
#driver.find_elements_by_class_name("JRY2Pb")[0].click()
driver.find_elements_by_class_name("JRY2Pb")[0].click()
# switch camera
time.sleep(2)
for x in driver.find_elements_by_class_name("JRtysb"):
x.click()
time.sleep(2)
for a in driver.find_elements_by_class_name("FwR7Pc"):
a.click()
time.sleep(2)
for b in driver.find_elements_by_class_name("XhPA0b"):
b.click()
time.sleep(2)
driver.find_element_by_tag_name('body').send_keys(Keys.TAB + Keys.TAB + Keys.ARROW_DOWN + Keys.ENTER)
time.sleep(1)
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
time.sleep(2)
# Join meet
time.sleep(1)
driver.implicitly_wait(2000)
driver.find_element_by_css_selector(
'div.uArJ5e.UQuaGc.Y5sE8d.uyXBBb.xKiqt').click()
# assign email id and password
mail_address = 'email'
password = 'password'
# create chrome instamce
opt = Options()
opt.add_argument('--disable-blink-features=AutomationControlled')
opt.add_argument('--start-maximized')
opt.add_experimental_option("prefs", {
"profile.default_content_setting_values.media_stream_mic": 1,
"profile.default_content_setting_values.media_stream_camera": 1,
"profile.default_content_setting_values.geolocation": 0,
"profile.default_content_setting_values.notifications": 1
})
while True:
if current_time == "05:00 / Wednesday":
sub = "link"
driver = webdriver.Chrome(options=opt, executable_path=r'/usr/bin/chromedriver')
Glogin(mail_address, password)
break
while True:
if current_time == "05:01 / Wednesday":
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
break
If the last while loop isn't running, it's because the previous while True loop never broke.
I suspect it has something to do with your condition current_time == "05:00 / Wednesday", which means current_time is never being set equal to "05:00 / Wednesday".
Based on the limited context, I can only suggest two things.
First off, don't use while True loops with only one if statement inside, use the if-condition as your while loop condition.
Secondly, you may want to reset current_time in your loop. Modify your loop to look something like this:
run_loop = True # will be false when we want our loop to quit
while run_loop:
if current_time == "05:00 / Wednesday":
sub = "link"
driver = webdriver.Chrome(options=opt, executable_path=r'/usr/bin/chromedriver')
Glogin(mail_address, password)
run_loop = False #break us out of the loop
else: #we keep trying to get the time to see if it's 05:00 yet
now = datetime.datetime.now()
current_time = now.strftime("%H:%M / %A")
The above code will continuously check if the time meets your conditions, and then exit appropriately.

How to iterate trough a list of web elements that is refreshing every 10 sec?

I am trying to iterate through a list that refreshes every 10 sec.
this is what I have tried:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
except: # NoSuchElementException or StaleElementReferenceException
time.sleep(3) # i have tried up to 20 sec
event = events[i]
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
this did not work so I tried another except
except: # second try that also did not work
element = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
)
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
Now I am assigning something that I will never use to name like:
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
name = "blablabla"
With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage
You can get all required data using JavaScript.
Code below will give you list of events map with all details instantly and without NoSuchElementException or StaleElementReferenceException errors:
me_id : unique identificator
href : href with details which you can use to get details
team_a : name of the first team
team_a_score : score of the first team
team_b : name of the second team
team_b_score : score of the second team
event_status : status of the event
event_clock : time of the event
def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
print(event.get('me_id'))
print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
print(event.get('team_a'))
print(event.get('team_a_score'))
print(event.get('team_b'))
print(event.get('team_b_score'))
print(event.get('event_status'))
print(event.get('event_clock'))
One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects. When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects.
One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. I personally feel the cleanest approach is to use the CSS :nth-child() approach:
from selenium import webdriver
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
# Get a list of all elements
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
# Iterate through the list, keeping track of the index
# note that nth-child referencing begins at index 1, not 0
for index, _ in enumerate(events, 1):
name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
))
print(name.text)
finally:
driver.quit()
if __name__ == "__main__":
main()
If I run the above script, I get this output:
$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod
Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. To overcome that you can use a retry decorator (pip install retry to get the package) to handle the SERE and reacquire the element:
import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
#retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
elem = driver.find_element_by_css_selector(selector)
return elem.text
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
for index, _ in enumerate(events, 1):
name = get_name(
driver,
"{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
)
)
print(name)
finally:
driver.quit()
if __name__ == "__main__":
main()
Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. I can't help with that without a better description of what you are actually trying to accomplish with this script.

Python Selenium: how to click on table content when changing table page

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
import time
url = "https://www.bungol.ca/"
driver = webdriver.Firefox(executable_path ='/usr/local/bin/geckodriver')
driver.get(url)
#Select toronto by default
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/select/optgroup[1]/option[1]""").click()
time.sleep(1)
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/button""").click()
driver.find_element_by_xpath("""/html/body/nav/div[1]/ul[1]/li[3]/select/option[8]""").click()
#select last 2 years
driver.find_element_by_xpath("""//*[#id="activeListings"]""").click()
#opening sold listing in that area
driver.find_element_by_xpath("""/html/body/div[5]/i""").click() #closes property type slide
driver.find_element_by_xpath("""//*[#id="navbarDropdown"]""").click()
driver.find_element_by_xpath("""//*[#id="listViewToggle"]""").click()
def data_collector():
hidden_next = driver.find_element_by_class_name("nextPaginate")
#inputs in textbox
inputElement = driver.find_element_by_id('navbarSearchAddressInput')
inputElement.send_keys('M3B2B6')
time.sleep(1)
#inputElement.send_keys(Keys.ENTER)
row_count = 3
table = driver.find_elements_by_css_selector("""#listViewTableBody""")
while hidden_next.is_displayed(): #while there is a next page button to be pressed
time.sleep(3) #delay for table refresh
#row_count = len(driver.find_elements_by_css_selector("""html body#body div#listView.table-responsive table#listViewTable.table.table-hover.mb-0 tbody#listViewTableBody tr.mb-2"""))
for row in range(row_count): #loop through the rows found
#alternate row by changing the tr index
driver.find_element_by_xpath("""/html/body/div[8]/table/tbody/tr[""" + str(row + 1) + """]/td[1]""").click()
time.sleep(2)
print(driver.find_element_by_css_selector("""#listingStatus""").text) #sold price
#closes the pop up after getting the data
driver.find_element_by_css_selector('.modal-xl > div:nth-child(1) > div:nth-child(1) > button:nth-child(1)').click()
time.sleep(1)
#clicks next page button for the table
driver.find_element_by_xpath("""//*[#id="listViewNextPaginate"]""").click()
if __name__ == "__main__":
data_collector()
The code loops through all the rows in the first table (currently set to 3 for testing), clicks on each row - pop-up shows up, grabs the information and close the pop-up. But when it clicks to the next page, it doesn't click on any of the rows of the second page. It doesn't show an error for not finding the row xpath either. But instead shows error for the pop-window close button because the popup did not open due to not pressing on the row to display pop-up window.
How do I make it click the rows when the table flips to the next page?
for table reference:
https://www.bungol.ca/map/location/toronto/?
close the property slider on the left
click tool -> open list
In my browser I also can't open the pop up, when I click on the row in the second page. So I think this can be the fault of the website.
If You want check if the element exists, You can use this code:
def check_exists_by_xpath(xpath, driver):
try:
driver.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True
Try this. My understanding is your script goes through the listings, opens a listing, grabs the listings status, close the listing and does the same for all the listings.
If my understanding is correct, the below code may help you. Its better to change implicit and time.sleep() to explicit wait and clean up the functions.
Having said that, I did not fully test the code, but the code did navigate to more than one page of listings and collected data
from selenium.webdriver import Firefox
from selenium.webdriver.support.select import Select
import time
driver = Firefox(executable_path=r'path to geckodriver.exe')
driver.get('https://www.bungol.ca/')
driver.maximize_window()
driver.implicitly_wait(10)
# Select toronto by default
driver.find_element_by_css_selector('#locationChoice button[type="submit"]').click()
sold_in_the_last = Select(driver.find_element_by_id('soldInTheLast'))
sold_in_the_last.select_by_visible_text('2 Years')
driver.find_element_by_id('activeListings').click()
# opening sold listing in that area
driver.find_element_by_css_selector('#leftSidebarClose>i').click()
driver.find_element_by_id('navbarDropdown').click()
driver.find_element_by_id('listViewToggle').click()
def get_listings():
listings_table = driver.find_element_by_id('listViewTableBody')
listings_table_rows = listings_table.find_elements_by_tag_name('tr')
return listings_table_rows
def get_sold_price(listing):
listing.find_element_by_css_selector('td:nth-child(1)').click()
time.sleep(2)
sold_price = driver.find_element_by_id('listingStatus').text
time.sleep(2)
close = driver.find_elements_by_css_selector('.modal-content>.modal-body>button[class="close"]')
close[2].click()
time.sleep(2)
return sold_price
def data_collector():
data = []
time.sleep(2)
next = driver.find_element_by_id('listViewNextPaginate')
# get all the listing prior to the last page
while next.is_displayed():
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
next.click()
# get listings from last page
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
return data
if __name__ == '__main__':
from pprint import pprint
data = data_collector()
pprint(data)
print(len(data))

Scraping an updating JavaScript page in Python

I've been working on a research project that is looking to obtain a list of reference articles from the Brazil Hemeroteca (The desired page reference: http://memoria.bn.br/DocReader/720887x/839, needs to be collected from two hidden elements on the following page: http://memoria.bn.br/DocReader/docreader.aspx?bib=720887x&pasta=ano%20189&pesq=Milho). I asked a question a few weeks back that was answered and I was able to get things running well in regards to that, but now I've hit a new snag and I'm not exactly sure how to get around it.
The problem is that after the first form is filled in, the page redirects to a second page, which is a JavaScript/AJAX enabled page which I need to spool through all of the matches, which is done by means of clicking a button at the top of the page. The problem I'm encountering is that when clicking the next page button I'm dealing with elements on the page that are updating, which leads to Stale Elements. I've tried to implement a few pieces of code to detect when this "stale" effect occurs to indicate the page has changed, but this has not provided much luck. Here is the code I've implemented:
import urllib
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
saveDir = "C:/tmp"
print("Opening Page...")
browser = webdriver.Chrome()
url = "http://bndigital.bn.gov.br/hemeroteca-digital/"
browser.get(url)
print("Searching for elements")
fLink = ""
fails = 0
frame_ref = browser.find_elements_by_tag_name("iframe")[0]
iframe = browser.switch_to.frame(frame_ref)
journal = browser.find_element_by_id("PeriodicoCmb1_Input")
search_journal = "Relatorios dos Presidentes dos Estados Brasileiros (BA)"
search_timeRange = "1890 - 1899"
search_text = "Milho"
xpath_form = "//input[#name=\'PesquisarBtn1\']"
xpath_journal = "//li[text()=\'"+search_journal+"\']"
xpath_timeRange = "//input[#name=\'PeriodoCmb1\' and not(#disabled)]"
xpath_timeSelect = "//li[text()=\'"+search_timeRange+"\']"
xpath_searchTerm = "//input[#name=\'PesquisaTxt1\']"
print("Locating Journal/Periodical")
journal.click()
dropDownJournal = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.XPATH, xpath_journal)))
dropDownJournal.click()
print("Waiting for Time Selection")
try:
timeRange = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeRange)))
timeRange.click()
time.sleep(1)
print("Locating Time Range")
dropDownTime = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeSelect)))
dropDownTime.click()
time.sleep(1)
except:
print("Failed...")
print("Adding Search Term")
searchTerm = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_searchTerm)))
searchTerm.clear()
searchTerm.send_keys(search_text)
time.sleep(5)
print("Perform search")
submitButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_form)))
submitButton.click()
# Wait for the second page to load, pull what we need from it.
download_list = []
browser.switch_to_window(browser.window_handles[-1])
print("Waiting for next page to load...")
matches = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//span[#id=\'OcorNroLbl\']")))
print("Next page ready, found match element... counting")
countText = matches.text
countTotal = int(countText[countText.find("/")+1:])
print("A total of " + str(countTotal) + " matches have been found, standing by for page load.")
for i in range(1, countTotal+2):
print("Waiting for page " + str(i-1) + " to load...")
while(fLink in download_list):
try:
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
except:
fails += 1
time.sleep(1)
if(fails == 10):
print("Locked on a page, attempting to push to next.")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
#raise
while(fLink == ""):
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
fails = 0
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
There are two "bugs" I'm trying to solve with this block. First, the very first page is always skipped in the loop (IE: fLink = ""), even though there is a test in there for it, I'm not sure why this occurs. The other bug is that the code will hang on specific pages completely randomly and the only way out is to break the code execution.
This block has been modified a few times so I know it's not the most "elegant" of solutions, but I'm starting to run out of time.
After taking a day off from this to think about it (And get some more sleep), I was able to figure out what was going on. The above code has three "big faults". This first is that it does not handle the StaleElementException versus the NoSuchElementException, which can occur while the page is shifting. Secondly, the loop condition was checking iteratively that a page wasn't in the list, which when entering the first run allowed the blank condition to load in directly as the loop was never executed on the first run (Should have used a do-while there, but I made more modifications). Finally, I made the silly error of only checking if the first hidden element was changing, when in fact that is the journal ID, and is pretty much constant through all.
The revisions began with an adaptation of a code on this other SO article to implement a "hold" condition until either one of the hidden elements changed:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import NoSuchElementException
def hold_until_element_changed(driver, element1_xpath, element2_xpath, old_element1_text, old_element2_text):
while True:
try:
element1 = driver.find_element_by_xpath(element1_xpath)
element2 = driver.find_element_by_xpath(element2_xpath)
if (element1.get_attribute('value') != old_element1_text) or (element2.get_attribute('value') != old_element2_text):
break
except StaleElementReferenceException:
break
except NoSuchElementException:
return False
time.sleep(1)
return True
I then modified the original looping condition, going back to the original "for loop" counter I had created without an internal loop, instead shooting a call to the above function to create the "hold" until the page had flipped, and voila, worked like a charm. (NOTE: I also upped the timeout on the next page button as this is what caused the locking condition)
for i in range(1, countTotal+1):
print("Waiting for page " + str(i) + " to load...")
bibxpath = "//input[#name=\'HiddenBibAlias\']"
pagexpath = "//input[#name=\'hPagFis\']"
jIDElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, bibxpath)))
jPageElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, pagexpath)))
jidtext = jIDElement.get_attribute('value')
jpagetext = jPageElement.get_attribute('value')
fLink = "http://memoria.bn.br/DocReader/" + jidtext + "/" + jpagetext + "&pesq=" + search_text
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
# Wait for next page to be ready
change = hold_until_element_changed(browser, bibxpath, pagexpath, jidtext, jpagetext)
if(change == False):
print("Something went wrong.")
All in all, a good exercise in thought and some helpful links for me to consider when posting future questions. Thanks!

Categories

Resources