Trying to get details of Tyres on this page. https://eurawheels.com/fr/catalogue/INFINY-INDIVIDUAL . Each tyre has different FINITIONS. The price and other details are different for each FINITIONS. I would like to click on each FINITION type. The problem is that on clicking the FINITION type the links go stale, and You cannot refresh the page, if you do it will take you back to the starting page. So, How can I avoid stale element error without refreshing the page?
count_added = False
buttons_div = driver.find_elements_by_xpath('//div[#class="btn-group"]')
fin_buttons = buttons_div[2].find_elements_by_xpath('.//button')
fin_count = len(fin_buttons)
if fin_count > 2:
for z in range(fin_count):
if not count_added:
z = z + 2 #Avoid clicking the Title
count_added = True
fin_buttons[z].click()
finition = fin_buttons[z].text
time.sleep(2)
driver.refresh() #Cannot do this. Will take to a different page
To clarify: the stale element is thrown because the element is no longer attached to the DOM. In your case is this: buttons_div = driver.find_elements_by_xpath('//div[#class="btn-group"]') that its being used as parent in the fin_buttons[z].click()
To solve this you'll have to "refresh" the element once the DOM changes. You can do that like this:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome(executable_path="D:/chromedriver.exe")
driver.get("https://eurawheels.com/fr/catalogue/INFINY-INDIVIDUAL")
driver.maximize_window()
driver.find_elements_by_xpath("//div[#class='card-body text-center']/a")[1].click()
def click_fin_buttons(index):
driver.find_elements_by_xpath('//div[#class="btn-group"]')[2].find_elements_by_xpath('.//button')[index].click()
def text_fin_buttons(index):
return driver.find_elements_by_xpath('//div[#class="btn-group"]')[2].find_elements_by_xpath('.//button')[index].text
sleep(2)
count_added = False
buttons_div = driver.find_elements_by_xpath('//div[#class="btn-group"]')
fin_buttons = buttons_div[2].find_elements_by_xpath('.//button')
fin_count = len(fin_buttons)
if fin_count > 2:
for z in range(fin_count):
if not count_added:
z = z + 2 #Avoid clicking the Title
count_added = True
click_fin_buttons(z)
finition = text_fin_buttons(z)
sleep(2)
print(finition)
#driver.refresh() #Cannot do this. Will take to a different page
Related
I am trying to iterate over a list of links on a [website][1] but Selenium is not able to locate particular and seemingly random ones. In particular, I am trying to click on each of the cities and extract the number of stores using a for loop but it always skips, say, "Alameda" among all some other cities even though when I see nothing different about the html code.
driver = webdriver.Chrome(path)
driver.set_window_size(1120, 1000)
driver.get("https://locations.traderjoes.com/ca/")
cities = driver.find_elements_by_class_name('itemlist')
for i in range(0, len(cities)):
print(city_list[i])
if cities[i].is_displayed():
cities[i].click()
num = len(driver.find_elements_by_class_name('address-left'))
num_stores_by_city.append(num)
driver.find_element_by_xpath('//*[#id="content"]/a[2]').click()
else:
time.sleep(3)
cities[i].click()
num = len(driver.find_elements_by_class_name('address-left'))
num_stores_by_city.append(num)
driver.find_element_by_xpath('//*[#id="content"]/a[2]').click()
This will determine the cities and then loop through each gathering the number of stores and adding information to a dictionary type object:
driver = webdriver.Chrome(path)
url = 'https://locations.traderjoes.com/ca/'
driver.get(url)
city_list = {}
city_index = 0
processing_cities = True
while processing_cities:
cities = driver.find_elements_by_css_selector('.itemlist a')
if city_index < len(cities):
city_text = cities[city_index].text
cities[city_index].click()
store_locations = driver.find_elements_by_css_selector('.itemlist')
city_list[city_text] = len(store_locations)
driver.get(url)
city_index += 1
else:
processing_cities = False
print(city_list)
One of the issues you were running into was that once you click on an element your previously found elements become stale. You need to re-find previously found elements to interact with them again.
I'm trying to loop through a dropdown menu on at this url: https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006
So, for example, the first dropdown menu - under options - lists out different materials and I want to select each one in turn and then gather some other information from the webpage before moving on to the next material. Here is my current code:
driver = webdriver.Firefox()
driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')
time.sleep(3)
driver.find_element_by_id('x-mark-icon').click()
select = Select(driver.find_element_by_name('Wiqj7mb4rsAq9LB'))
options = select.options
optionsList = []
driver.find_elements_by_class_name('select-wrapper')[0].click()
element = driver.find_element_by_xpath("//select[#name='Wiqj7mb4rsAq9LB']")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
# driver.execute_script("arguments[0].scrollIntoView();", element)
for option in options: #iterate over the options, place attribute value in list
optionsList.append(option.get_attribute("value"))
for optionValue in optionsList:
print("starting loop on option %s" % optionValue)
# select = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//select[#name='Wiqj7mb4rsAq9LB']")))
# select = Select(select)
select.select_by_value(optionValue)
I started with just the loop, but got this error:
ElementNotInteractableException: Message: Element <option> could not be scrolled into view
I then added the webdriverwait and get a TimeoutException error.
I then realized I should probably click on the wrapper in which the dropdown is held, so I added the click, which does pup up the menu, but I still got the TimeoutException.
So I thought, maybe I should move to the element, which I tried with the action chain lines and I got this error
WebDriverException: Message: TypeError: rect is undefined
I tried to avoid that error by using this code instead:
# driver.execute_script("arguments[0].scrollIntoView();", element)
Which just resulted in the timeoutexception again.
I pretty new to Python and Selenium and have basically just been modifying code from SO answers to similar questions, but nothing has worked.
I'm using python 3.6 and the current versions of Selenium and firefox webdriver.
If anything is unclear or if you need more info just let me know.
Thanks so much!
EDIT: Based on the answer and comments by Kajal Kunda, I've updated my code to the following:
`material_dropdown = driver.find_element_by_xpath("//input[#class='select-
dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
materials=driver.find_elements_by_css_selector("div.select-wrapper
ul.dropdown-content li")
for material in materials:
# material_dropdown =
driver.find_element_by_xpath("//input[#class='select-dropdown']")
# driver.execute_script("arguments[0].click();", material_dropdown)
# materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material_ele=material.find_element_by_tag_name('span')
if material_ele.text!='':
material_ele.click()
time.sleep(5)
price = driver.find_element_by_class_name("dataPriceDisplay")
print(price.text)`
The result is that it successfully prints the price for the first type of material, but then it returns:
StaleElementReferenceException: Message: The element reference of <li class=""> is stale;...
I've tried variations of having the hashed out lines in and outside of the loop, but always get a version of the StaleElementReferenceException error.
Any suggestions?
Thanks!
You could do the whole thing with requests. Grab the drop down list from the options listed in drop down then concatenate the value attributes into requests url that retrieves json containing all the info on the page. Same principle applies for adding in other dropdown values. The ids for each drop down selection are the value attributes of the options in the drop down and appear in the url I show separated by // for each drop down selection.
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.accuform.com/product/getSku/danger-danger-authorized-personnel-only-MADM006/1/false/null//{}//WHFIw3xXmQx8zlz//6wr93DdrFo5JV//WdnO0RpwKpc4fGF'
startURL = 'https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006'
res = requests.get(startURL)
soup = bs(res.content, 'lxml')
materials = [item['value'] for item in soup.select('#Wiqj7mb4rsAq9LB option')]
sizes = [item['value'] for item in soup.select('#WvXESrTyQjM3Ciw option')]
languages = [item['value'] for item in soup.select('#WUYWGMePtpmpmhy option')]
units = [item['value'] for item in soup.select('#W91eqaJ0WPXwe9b option')]
for material in materials:
data = requests.get(url.format(material)).json()
soup = bs(data['dataMaterialBullets'], 'lxml')
lines = [item.text for item in soup.select('li')]
print(lines)
print(data['dataPriceDisplay'])
# etc......
Sample of JSON:
Try the below code.It should work.
driver = webdriver.Firefox()
driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')
time.sleep(3)
driver.find_element_by_id('x-mark-icon').click()
material_dropdown = driver.find_element_by_xpath("//input[#class='select-dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
#Code for material dropdown
materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material_optionsList = []
for material in materials:
material_ele=material.find_element_by_tag_name('span')
if material_ele.text!='':
material_optionsList.append(material_ele.text)
print(material_optionsList)
driver.execute_script("arguments[0].click();", material_dropdown)
size_dropdown = driver.find_element_by_xpath("(//input[#class='select-dropdown'])[2]")
driver.execute_script("arguments[0].click();", size_dropdown)
#Code for size dropdown
Sizes=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
size_optionsList = []
for size in Sizes:
size_ele=size.find_element_by_tag_name('span')
if size_ele.text!='':
size_optionsList.append(size_ele.text)
driver.execute_script("arguments[0].click();", size_dropdown)
Output :
[u'Adhesive Vinyl', u'Plastic', u'Adhesive Dura-Vinyl', u'Aluminum', u'Dura-Plastic\u2122', u'Aluma-Lite\u2122', u'Dura-Fiberglass\u2122', u'Accu-Shield\u2122']
Hope you will do the remaining.Let me know if it works for you.
EDIT Code for loop through and get the price value of materials.
for material in range(len(materials)):
material_ele=materials[material]
if material_ele.text!='':
#material_optionsList.append(material_ele.text)
#material_ele.click()
driver.execute_script("arguments[0].click();", material_ele)
time.sleep(2)
price = driver.find_element_by_id("priceDisplay")
print( price.text)
time.sleep(2)
material_dropdown = driver.find_element_by_xpath("//input[#class='select-dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
materials = driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material+=2
Output :
$8.31
$9.06
$13.22
$15.91
$15.91
I have a python script using selenium to go to a given Instagram profile and iterate over the user's followers. On the instagram website when one clicks to see the list of followers, a pop-up opens with the accounts listed (here's a screenshot of the site)
However both visually and in the html, only 12 accounts are shown. In order to see more one has to scroll down, so I tried doing this with the Keys.PAGE_DOWN input.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
...
username = 'Username'
password = 'Password'
message = 'blahblah'
tryTime = 2
#create driver and log in
driver = webdriver.Chrome()
logIn(driver, username, password, tryTime)
#gets rid of preference pop-up
a = driver.find_elements_by_class_name("HoLwm")
a[0].click()
#go to profile
driver.get("https://www.instagram.com/{}/".format(username))
#go to followers list
followers = driver.find_element_by_xpath("//a[#href='/{}/followers/']".format(username))
followers.click()
time.sleep(tryTime)
#find all li elements in list
fBody = driver.find_element_by_xpath("//div[#role='dialog']")
fBody.send_keys(Keys.PAGE_DOWN)
fList = fBody.find_elements_by_tag("li")
print("fList len is {}".format(len(fList)))
time.sleep(tryTime)
print("ended")
driver.quit()
When I try to run this I get the following error:
Message: unknown error: cannot focus element
I know this is probably because I'm using the wrong element for fBody, but I don't know which would be the right one. Does anybody know which element I should send the PAGE_DOWN key to, or if there is another way to load the accounts?
Any help is much appreciated!
the element you're looking is //div[#class='isgrP'] and Keys.PAGE_DOWN is not work for scrollable div.
Your variable fList hold old value, you need to find again the elements after scroll.
#find all li elements in list
fBody = driver.find_element_by_xpath("//div[#class='isgrP']")
scroll = 0
while scroll < 5: # scroll 5 times
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;', fBody)
time.sleep(tryTime)
scroll += 1
fList = driver.find_elements_by_xpath("//div[#class='isgrP']//li")
print("fList len is {}".format(len(fList)))
print("ended")
#driver.quit()
The above code works fine if you add iteration (for) with range
for i in range(1, 4):
try:
#find all li elements in list
fBody = self.driver.find_element_by_xpath("//div[#class='isgrP']")
scroll = 0
while scroll < 5: # scroll 5 times
self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;', fBody)
time.sleep(2)
scroll += 1
fList = self.driver.find_elements_by_xpath("//div[#class='isgrP']//li")
print("fList len is {}".format(len(fList)))
except Exception as e:
print(e, "canot scrol")
try:
#get tags with a
hrefs_in_view = self.driver.find_elements_by_tag_name('a')
# finding relevant hrefs
hrefs_in_view = [elem.get_attribute('title') for elem in hrefs_in_view]
[pic_hrefs.append(title) for title in hrefs_in_view if title not in pic_hrefs]
print("Check: pic href length " + str(len(pic_hrefs)))
except Exception as tag:
print(tag, "can not find tag")
So, the for loop makes it to possible scrol even if the while loop miss
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
import time
url = "https://www.bungol.ca/"
driver = webdriver.Firefox(executable_path ='/usr/local/bin/geckodriver')
driver.get(url)
#Select toronto by default
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/select/optgroup[1]/option[1]""").click()
time.sleep(1)
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/button""").click()
driver.find_element_by_xpath("""/html/body/nav/div[1]/ul[1]/li[3]/select/option[8]""").click()
#select last 2 years
driver.find_element_by_xpath("""//*[#id="activeListings"]""").click()
#opening sold listing in that area
driver.find_element_by_xpath("""/html/body/div[5]/i""").click() #closes property type slide
driver.find_element_by_xpath("""//*[#id="navbarDropdown"]""").click()
driver.find_element_by_xpath("""//*[#id="listViewToggle"]""").click()
def data_collector():
hidden_next = driver.find_element_by_class_name("nextPaginate")
#inputs in textbox
inputElement = driver.find_element_by_id('navbarSearchAddressInput')
inputElement.send_keys('M3B2B6')
time.sleep(1)
#inputElement.send_keys(Keys.ENTER)
row_count = 3
table = driver.find_elements_by_css_selector("""#listViewTableBody""")
while hidden_next.is_displayed(): #while there is a next page button to be pressed
time.sleep(3) #delay for table refresh
#row_count = len(driver.find_elements_by_css_selector("""html body#body div#listView.table-responsive table#listViewTable.table.table-hover.mb-0 tbody#listViewTableBody tr.mb-2"""))
for row in range(row_count): #loop through the rows found
#alternate row by changing the tr index
driver.find_element_by_xpath("""/html/body/div[8]/table/tbody/tr[""" + str(row + 1) + """]/td[1]""").click()
time.sleep(2)
print(driver.find_element_by_css_selector("""#listingStatus""").text) #sold price
#closes the pop up after getting the data
driver.find_element_by_css_selector('.modal-xl > div:nth-child(1) > div:nth-child(1) > button:nth-child(1)').click()
time.sleep(1)
#clicks next page button for the table
driver.find_element_by_xpath("""//*[#id="listViewNextPaginate"]""").click()
if __name__ == "__main__":
data_collector()
The code loops through all the rows in the first table (currently set to 3 for testing), clicks on each row - pop-up shows up, grabs the information and close the pop-up. But when it clicks to the next page, it doesn't click on any of the rows of the second page. It doesn't show an error for not finding the row xpath either. But instead shows error for the pop-window close button because the popup did not open due to not pressing on the row to display pop-up window.
How do I make it click the rows when the table flips to the next page?
for table reference:
https://www.bungol.ca/map/location/toronto/?
close the property slider on the left
click tool -> open list
In my browser I also can't open the pop up, when I click on the row in the second page. So I think this can be the fault of the website.
If You want check if the element exists, You can use this code:
def check_exists_by_xpath(xpath, driver):
try:
driver.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True
Try this. My understanding is your script goes through the listings, opens a listing, grabs the listings status, close the listing and does the same for all the listings.
If my understanding is correct, the below code may help you. Its better to change implicit and time.sleep() to explicit wait and clean up the functions.
Having said that, I did not fully test the code, but the code did navigate to more than one page of listings and collected data
from selenium.webdriver import Firefox
from selenium.webdriver.support.select import Select
import time
driver = Firefox(executable_path=r'path to geckodriver.exe')
driver.get('https://www.bungol.ca/')
driver.maximize_window()
driver.implicitly_wait(10)
# Select toronto by default
driver.find_element_by_css_selector('#locationChoice button[type="submit"]').click()
sold_in_the_last = Select(driver.find_element_by_id('soldInTheLast'))
sold_in_the_last.select_by_visible_text('2 Years')
driver.find_element_by_id('activeListings').click()
# opening sold listing in that area
driver.find_element_by_css_selector('#leftSidebarClose>i').click()
driver.find_element_by_id('navbarDropdown').click()
driver.find_element_by_id('listViewToggle').click()
def get_listings():
listings_table = driver.find_element_by_id('listViewTableBody')
listings_table_rows = listings_table.find_elements_by_tag_name('tr')
return listings_table_rows
def get_sold_price(listing):
listing.find_element_by_css_selector('td:nth-child(1)').click()
time.sleep(2)
sold_price = driver.find_element_by_id('listingStatus').text
time.sleep(2)
close = driver.find_elements_by_css_selector('.modal-content>.modal-body>button[class="close"]')
close[2].click()
time.sleep(2)
return sold_price
def data_collector():
data = []
time.sleep(2)
next = driver.find_element_by_id('listViewNextPaginate')
# get all the listing prior to the last page
while next.is_displayed():
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
next.click()
# get listings from last page
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
return data
if __name__ == '__main__':
from pprint import pprint
data = data_collector()
pprint(data)
print(len(data))
I've been working on a research project that is looking to obtain a list of reference articles from the Brazil Hemeroteca (The desired page reference: http://memoria.bn.br/DocReader/720887x/839, needs to be collected from two hidden elements on the following page: http://memoria.bn.br/DocReader/docreader.aspx?bib=720887x&pasta=ano%20189&pesq=Milho). I asked a question a few weeks back that was answered and I was able to get things running well in regards to that, but now I've hit a new snag and I'm not exactly sure how to get around it.
The problem is that after the first form is filled in, the page redirects to a second page, which is a JavaScript/AJAX enabled page which I need to spool through all of the matches, which is done by means of clicking a button at the top of the page. The problem I'm encountering is that when clicking the next page button I'm dealing with elements on the page that are updating, which leads to Stale Elements. I've tried to implement a few pieces of code to detect when this "stale" effect occurs to indicate the page has changed, but this has not provided much luck. Here is the code I've implemented:
import urllib
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
saveDir = "C:/tmp"
print("Opening Page...")
browser = webdriver.Chrome()
url = "http://bndigital.bn.gov.br/hemeroteca-digital/"
browser.get(url)
print("Searching for elements")
fLink = ""
fails = 0
frame_ref = browser.find_elements_by_tag_name("iframe")[0]
iframe = browser.switch_to.frame(frame_ref)
journal = browser.find_element_by_id("PeriodicoCmb1_Input")
search_journal = "Relatorios dos Presidentes dos Estados Brasileiros (BA)"
search_timeRange = "1890 - 1899"
search_text = "Milho"
xpath_form = "//input[#name=\'PesquisarBtn1\']"
xpath_journal = "//li[text()=\'"+search_journal+"\']"
xpath_timeRange = "//input[#name=\'PeriodoCmb1\' and not(#disabled)]"
xpath_timeSelect = "//li[text()=\'"+search_timeRange+"\']"
xpath_searchTerm = "//input[#name=\'PesquisaTxt1\']"
print("Locating Journal/Periodical")
journal.click()
dropDownJournal = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.XPATH, xpath_journal)))
dropDownJournal.click()
print("Waiting for Time Selection")
try:
timeRange = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeRange)))
timeRange.click()
time.sleep(1)
print("Locating Time Range")
dropDownTime = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeSelect)))
dropDownTime.click()
time.sleep(1)
except:
print("Failed...")
print("Adding Search Term")
searchTerm = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_searchTerm)))
searchTerm.clear()
searchTerm.send_keys(search_text)
time.sleep(5)
print("Perform search")
submitButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_form)))
submitButton.click()
# Wait for the second page to load, pull what we need from it.
download_list = []
browser.switch_to_window(browser.window_handles[-1])
print("Waiting for next page to load...")
matches = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//span[#id=\'OcorNroLbl\']")))
print("Next page ready, found match element... counting")
countText = matches.text
countTotal = int(countText[countText.find("/")+1:])
print("A total of " + str(countTotal) + " matches have been found, standing by for page load.")
for i in range(1, countTotal+2):
print("Waiting for page " + str(i-1) + " to load...")
while(fLink in download_list):
try:
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
except:
fails += 1
time.sleep(1)
if(fails == 10):
print("Locked on a page, attempting to push to next.")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
#raise
while(fLink == ""):
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
fails = 0
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
There are two "bugs" I'm trying to solve with this block. First, the very first page is always skipped in the loop (IE: fLink = ""), even though there is a test in there for it, I'm not sure why this occurs. The other bug is that the code will hang on specific pages completely randomly and the only way out is to break the code execution.
This block has been modified a few times so I know it's not the most "elegant" of solutions, but I'm starting to run out of time.
After taking a day off from this to think about it (And get some more sleep), I was able to figure out what was going on. The above code has three "big faults". This first is that it does not handle the StaleElementException versus the NoSuchElementException, which can occur while the page is shifting. Secondly, the loop condition was checking iteratively that a page wasn't in the list, which when entering the first run allowed the blank condition to load in directly as the loop was never executed on the first run (Should have used a do-while there, but I made more modifications). Finally, I made the silly error of only checking if the first hidden element was changing, when in fact that is the journal ID, and is pretty much constant through all.
The revisions began with an adaptation of a code on this other SO article to implement a "hold" condition until either one of the hidden elements changed:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import NoSuchElementException
def hold_until_element_changed(driver, element1_xpath, element2_xpath, old_element1_text, old_element2_text):
while True:
try:
element1 = driver.find_element_by_xpath(element1_xpath)
element2 = driver.find_element_by_xpath(element2_xpath)
if (element1.get_attribute('value') != old_element1_text) or (element2.get_attribute('value') != old_element2_text):
break
except StaleElementReferenceException:
break
except NoSuchElementException:
return False
time.sleep(1)
return True
I then modified the original looping condition, going back to the original "for loop" counter I had created without an internal loop, instead shooting a call to the above function to create the "hold" until the page had flipped, and voila, worked like a charm. (NOTE: I also upped the timeout on the next page button as this is what caused the locking condition)
for i in range(1, countTotal+1):
print("Waiting for page " + str(i) + " to load...")
bibxpath = "//input[#name=\'HiddenBibAlias\']"
pagexpath = "//input[#name=\'hPagFis\']"
jIDElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, bibxpath)))
jPageElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, pagexpath)))
jidtext = jIDElement.get_attribute('value')
jpagetext = jPageElement.get_attribute('value')
fLink = "http://memoria.bn.br/DocReader/" + jidtext + "/" + jpagetext + "&pesq=" + search_text
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
# Wait for next page to be ready
change = hold_until_element_changed(browser, bibxpath, pagexpath, jidtext, jpagetext)
if(change == False):
print("Something went wrong.")
All in all, a good exercise in thought and some helpful links for me to consider when posting future questions. Thanks!