Python Selenium find_element_by_id with default if not found

Python Selenium find_element_by_id with default if not found - python

I would like to find an element on the page by id. The issue is that this element is only temporary and will not exist all of the time. Therefore I would like to set the default value so I can check for it in a conditional like so:
covidPopUp = driver.find_element_by_id("sgpb-popup-dialog-main-
div").extract(default='not-found')
if(covidPopUp == 'not-found'):
load_more_btn = driver.find_element_by_id("load_more_button")
load_more_btn.click()
else:
popUpClose = driver.find_element_by_id("sgpb-popup-close-button-6")
popUpClose.click()
However, this produces the following error:
AttributeError: 'WebElement' object has no attribute 'extract'

I think you have a couple of options.
Use find_element_by_id but catch the exception
from selenium.common.exceptions import NoSuchElementException
try:
covidPopUp = driver.find_element_by_id("sgpb-popup-dialog-main-div")
except NoSuchElementException:
covidPopUp = "not-found"
Use find_elements_by_id (note plural) and check if list is not empty.
covidPopUp = driver.find_elements_by_id("sgpb-popup-dialog-main-div")
covidPopUp = covidPopUp[0] if covidPopUp else "not-found"

Related

Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore

I am performing web scraping via Python \ Selenium \ Chrome headless driver. I am reading the results from JSON - here is my code:
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
#try:
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
# Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])
writefunction()
CustId = CustId+1
The issue is sometimes 'addressDisplayName' will be present in the result set and sometimes not. If its not, it errors with the error:
IndexError: list index out of range
Which makes sense, as it doesn't exist. How do I ignore this though - so if 'addressDisplayName' doesn't exist just continue with the loop? I've tried using a TRY but the code still stops executing.

try..except block should resolved your issue.
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
try:
Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])
except:
Addr ="NaN"
CustId = CustId+1

If you get an IndexError (with an index of '0') it means that your list is empty. So it is one step in the path earlier (otherwise you'd get a KeyError if 'addressDisplayName' was missing from the dict).
You can check if the list has elements:
if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
# get the data
Otherwise you can indeed use try..except:
try:
# get the data
except IndexError, KeyError:
# handle missing data

How to iterate trough a list of web elements that is refreshing every 10 sec?

I am trying to iterate through a list that refreshes every 10 sec.
this is what I have tried:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
except: # NoSuchElementException or StaleElementReferenceException
time.sleep(3) # i have tried up to 20 sec
event = events[i]
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
this did not work so I tried another except
except: # second try that also did not work
element = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
)
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
Now I am assigning something that I will never use to name like:
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
name = "blablabla"
With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage

You can get all required data using JavaScript.
Code below will give you list of events map with all details instantly and without NoSuchElementException or StaleElementReferenceException errors:
me_id : unique identificator
href : href with details which you can use to get details
team_a : name of the first team
team_a_score : score of the first team
team_b : name of the second team
team_b_score : score of the second team
event_status : status of the event
event_clock : time of the event
def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
print(event.get('me_id'))
print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
print(event.get('team_a'))
print(event.get('team_a_score'))
print(event.get('team_b'))
print(event.get('team_b_score'))
print(event.get('event_status'))
print(event.get('event_clock'))

One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects. When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects.
One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. I personally feel the cleanest approach is to use the CSS :nth-child() approach:
from selenium import webdriver
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
# Get a list of all elements
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
# Iterate through the list, keeping track of the index
# note that nth-child referencing begins at index 1, not 0
for index, _ in enumerate(events, 1):
name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
))
print(name.text)
finally:
driver.quit()
if __name__ == "__main__":
main()
If I run the above script, I get this output:
$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod
Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. To overcome that you can use a retry decorator (pip install retry to get the package) to handle the SERE and reacquire the element:
import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
#retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
elem = driver.find_element_by_css_selector(selector)
return elem.text
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
for index, _ in enumerate(events, 1):
name = get_name(
driver,
"{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
)
)
print(name)
finally:
driver.quit()
if __name__ == "__main__":
main()
Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. I can't help with that without a better description of what you are actually trying to accomplish with this script.

If Elif issue / Selenium Python

I made the following code to scrape some website. A list of product code is itered on research bar with Selenium. If there is no result found (if driver.find_element_by_css_selector("div[class='search-did-you-mean']"):) i just clear the research bar to make another search. If there is some results (elif driver.find_element_by_css_selector("div[class='result-search']"):) I scrape it
Here is the code :
for product in product_list:
inputElement = driver.find_element_by_id("q")
inputElement.send_keys(product[0])
inputElement.send_keys(Keys.ENTER)
inputElement.click()
time.sleep(5)
if driver.find_element_by_css_selector("div[class='search-did-you-mean']"):
time.sleep(5)
clearResearch = driver.find_element_by_id("q")
WebDriverWait(driver, 10).until_not(EC.visibility_of_element_located((By.ID, "overley")))
clearResearch.send_keys(Keys.CONTROL + "a")
clearResearch.send_keys(Keys.DELETE)
elif driver.find_element_by_css_selector("div[class='result-search']"):
time.sleep(5)
item['price'] = driver.find_element_by_css_selector("span[class='sale-price']").text
item['desc'] = driver.find_element_by_css_selector("h3[class='product-name']").text
print(item)
There is no result for the first product code of the list, so it is cleared and a new code is given. Problem appears with the second item, there is results but my elif condition seems not understand as I get an Unable to locate element: div[class='search-did-you-mean'] error.
Do you know what is wrong with my code ? Thanks a lot

This is selenium behavior will throw exception if no element found, wrap it in try-except
first_product = None
try:
first_product = driver.find_element_by_css_selector("div[class='search-did-you-mean']"
except: pass
if first_product:
.....

You can use find_elements_by_css_selector and check if the returned list has elements in it
if driver.find_elements_by_css_selector("div[class='search-did-you-mean']"):
#...
elif driver.find_elements_by_css_selector("div[class='result-search']"):
#...

Find XPath Paragraph (following-sibling?)

I'm using Selenium with Python 2.7.10 and would like to grab the paragraph following the "Description" header on this page: http://etfdb.com/etf/ROBO/
from selenium import webdriver as driver
from selenium.common.exceptions import NoSuchElementException
def scrape(driver, key):
try:
find_value = driver.find_element_by_xpath("//span[#class='panel__sub-heading' and . = '%s']/following-sibling::p" % key).text
except NoSuchElementException:
print "Not Found"
return None
else:
value = re.search(r"(.+)", find_value).group().encode("utf-8")
print value
return value
description = scrape(driver, "Description")
The XPath I'm using is incorrect because it yields no result. What would be the correct way to find the paragraph following the header "Description"?

This is not a span tag - it's h3:
//h3[#class='panel__sub-heading' and . = '%s']/following-sibling::p

Python Selenium wait for several elements to load

I have a list, which is dynamically loaded by AJAX.
At first, while loading, it's code is like this:
<ul><li class="last"><a class="loading" href="#"><ins> </ins>Загрузка...</a></li></ul>
When the list is loaded, all of it li and a are changed. And it's always more than 1 li.
Like this:
<ul class="ltr">
<li id="t_b_68" class="closed" rel="simple">
<a id="t_a_68" href="javascript:void(0)">Category 1</a>
</li>
<li id="t_b_64" class="closed" rel="simple">
<a id="t_a_64" href="javascript:void(0)">Category 2</a>
</li>
...
I need to check if list is loaded, so I check if it has several li.
So far I tried:
1) Custom waiting condition
class more_than_one(object):
def __init__(self, selector):
self.selector = selector
def __call__(self, driver):
elements = driver.find_elements_by_css_selector(self.selector)
if len(elements) > 1:
return True
return False
...
try:
query = WebDriverWait(driver, 30).until(more_than_one('li'))
except:
print "Bad crap"
else:
# Then load ready list
2) Custom function based on find_elements_by
def wait_for_several_elements(driver, selector, min_amount, limit=60):
"""
This function provides awaiting of <min_amount> of elements found by <selector> with
time limit = <limit>
"""
step = 1 # in seconds; sleep for 500ms
current_wait = 0
while current_wait < limit:
try:
print "Waiting... " + str(current_wait)
query = driver.find_elements_by_css_selector(selector)
if len(query) > min_amount:
print "Found!"
return True
else:
time.sleep(step)
current_wait += step
except:
time.sleep(step)
current_wait += step
return False
This doesn't work, because driver (current element passed to this function) gets lost in DOM. UL isn't changed but Selenium can't find it anymore for some reason.
3) Excplicit wait. This just sucks, because some lists are loaded instantly and some take 10+ secs to load. If I use this technique I have to wait max time every occurence, which is very bad for my case.
4) Also I can't wait for child element with XPATH correctly. This one just expects ul to appear.
try:
print "Going to nested list..."
#time.sleep(WAIT_TIME)
query = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, './/ul')))
nested_list = child.find_element_by_css_selector('ul')
Please, tell me the right way to be sure, that several heir elements are loaded for specified element.
P.S. All this checks and searches should be relative to current element.

First and foremost the elements are AJAX elements.
Now, as per the requirement to locate all the desired elements and create a list, the simplest approach would be to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.ltr li[id^='t_b_'] > a[id^='t_a_'][href]")))
Using XPATH:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[#class='ltr']//li[starts-with(#id, 't_b_')]/a[starts-with(#id, 't_a_') and starts-with(., 'Category')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Incase your usecase is to wait for certain number of elements to be loaded e.g. 10 elements, you can use you can use the lambda function as follows:
Using >:
myLength = 9
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[#class='ltr']//li[starts-with(#id, 't_b_')]/a[starts-with(#id, 't_a_') and starts-with(., 'Category')]")) > int(myLength))
Using ==:
myLength = 10
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[#class='ltr']//li[starts-with(#id, 't_b_')]/a[starts-with(#id, 't_a_') and starts-with(., 'Category')]")) == int(myLength))
You can find a relevant discussion in How to wait for number of elements to be loaded using Selenium and Python
References
You can find a couple of relevant detailed discussions in:
Getting specific elements in selenium
Cannot find table element from div element in selenium python
Extract text from an aria-label selenium webdriver (python)

I created AllEc which basically piggybacks on WebDriverWait.until logic.
This will wait until the timeout occurs or when all of the elements have been found.
from typing import Callable
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import StaleElementReferenceException
class AllEc(object):
def __init__(self, *args: Callable, description: str = None):
self.ecs = args
self.description = description
def __call__(self, driver):
try:
for fn in self.ecs:
if not fn(driver):
return False
return True
except StaleElementReferenceException:
return False
# usage example:
wait = WebDriverWait(driver, timeout)
ec1 = EC.invisibility_of_element_located(locator1)
ec2 = EC.invisibility_of_element_located(locator2)
ec3 = EC.invisibility_of_element_located(locator3)
all_ec = AllEc(ec1, ec2, ec3, description="Required elements to show page has loaded.")
found_elements = wait.until(all_ec, "Could not find all expected elements")
Alternatively I created AnyEc to look for multiple elements but returns on the first one found.
class AnyEc(object):
"""
Use with WebDriverWait to combine expected_conditions in an OR.
Example usage:
>>> wait = WebDriverWait(driver, 30)
>>> either = AnyEc(expectedcondition1, expectedcondition2, expectedcondition3, etc...)
>>> found = wait.until(either, "Cannot find any of the expected conditions")
"""
def __init__(self, *args: Callable, description: str = None):
self.ecs = args
self.description = description
def __iter__(self):
return self.ecs.__iter__()
def __call__(self, driver):
for fn in self.ecs:
try:
rt = fn(driver)
if rt:
return rt
except TypeError as exc:
raise exc
except Exception as exc:
# print(exc)
pass
def __repr__(self):
return " ".join(f"{e!r}," for e in self.ecs)
def __str__(self):
return f"{self.description!s}"
either = AnyEc(ec1, ec2, ec3)
found_element = wait.until(either, "Could not find any of the expected elements")
Lastly, if it's possible to do so, you could try waiting for Ajax to be finished.
This is not useful in all cases -- e.g. Ajax is always active. In the cases where Ajax runs and finishes it can work. There are also some ajax libraries that do not set the active attribute, so double check that you can rely on this.
def is_ajax_complete(driver)
rt = driver.execute_script("return jQuery.active", *args)
return rt == 0
wait.until(lambda driver: is_ajax_complete(driver), "Ajax did not finish")

(1) You did not mention the error you get with it
(2) Since you mention
...because driver (current element passed to this function)...
I'll assume this is actually a WebElement. In this case, instead of passing the object itself to your method, simply pass the selector that finds that WebElement (in your case, the ul). If the "driver gets lost in DOM", it could be that re-creating it inside the while current_wait < limit: loop could mitigate the problem
(3) yeap, time.sleep() will only get you that far
(4) Since the li elements loaded dynamically contain class=closed, instead of (By.XPATH, './/ul'), you could try (By.CSS_SELECTOR, 'ul > li.closed') (more details on CSS Selectors here)

Keeping in mind comments of Mr.E. and Arran I made my list traversal fully on CSS selectors. The tricky part was about my own list structure and marks (changing classes, etc.), as well as about creating required selectors on the fly and keeping them in memory during traversal.
I disposed waiting for several elements by searching for anything that is not loading state. You may use ":nth-child" selector as well like here:
#in for loop with enumerate for i
selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child <li> by its order pos
This is my hard-commented code solution for example:
def parse_crippled_shifted_list(driver, frame, selector, level=1, parent_id=0, path=None):
"""
Traversal of html list of special structure (you can't know if element has sub list unless you enter it).
Supports start from remembered list element.
Nested lists have classes "closed" and "last closed" when closed and "open" and "last open" when opened (on <li>).
Elements themselves have classes "leaf" and "last leaf" in both cases.
Nested lists situate in <li> element as <ul> list. Each <ul> appears after clicking <a> in each <li>.
If you click <a> of leaf, page in another frame will load.
driver - WebDriver; frame - frame of the list; selector - selector to current list (<ul>);
level - level of depth, just for console output formatting, parent_id - id of parent category (in DB),
path - remained path in categories (ORM objects) to target category to start with.
"""
# Add current level list elements
# This method selects all but loading. Just what is needed to exclude.
selector.append(' > li > a:not([class=loading])')
# Wait for child list to load
try:
query = WebDriverWait(driver, WAIT_LONG_TIME).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))
except TimeoutException:
print "%s timed out" % ''.join(selector)
else:
# List is loaded
del selector[-1] # selector correction: delete last part aimed to get loaded content
selector.append(' > li')
children = driver.find_elements_by_css_selector(''.join(selector)) # fetch list elements
# Walk the whole list
for i, child in enumerate(children):
del selector[-1] # delete non-unique li tag selector
if selector[-1] != ' > ul' and selector[-1] != 'ul.ltr':
del selector[-1]
selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child <li> by its order pos
selector.append(' > a') # add 'li > a' reference to click
child_link = driver.find_element_by_css_selector(''.join(selector))
# If we parse freely further (no need to start from remembered position)
if not path:
# Open child
try:
double_click(driver, child_link)
except InvalidElementStateException:
print "\n\nERROR\n", InvalidElementStateException.message(), '\n\n'
else:
# Determine its type
del selector[-1] # delete changed and already useless link reference
# If <li> is category, it would have <ul> as child now and class="open"
# Check by class is priority, because <li> exists for sure.
current_li = driver.find_element_by_css_selector(''.join(selector))
# Category case - BRANCH
if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
new_parent_id = process_category_case(child_link, parent_id, level) # add category to DB
selector.append(' > ul') # forward to nested list
# Wait for nested list to load
try:
query = WebDriverWait(driver, WAIT_LONG_TIME).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))
except TimeoutException:
print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\
''.join(selector), WAIT_LONG_TIME
# Parse nested list
else:
parse_crippled_shifted_list(driver, frame, selector, level + 1, new_parent_id)
# Page case - LEAF
elif current_li.get_attribute('class') == 'leaf' or current_li.get_attribute('class') == 'last leaf':
process_page_case(driver, child_link, level)
else:
raise Exception('Damn! Alien class: %s' % current_li.get_attribute('class'))
# If it's required to continue from specified category
else:
# Check if it's required category
if child_link.text == path[0].name:
# Open required category
try:
double_click(driver, child_link)
except InvalidElementStateException:
print "\n\nERROR\n", InvalidElementStateException.msg, '\n\n'
else:
# This element of list must be always category (have nested list)
del selector[-1] # delete changed and already useless link reference
# If <li> is category, it would have <ul> as child now and class="open"
# Check by class is priority, because <li> exists for sure.
current_li = driver.find_element_by_css_selector(''.join(selector))
# Category case - BRANCH
if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
selector.append(' > ul') # forward to nested list
# Wait for nested list to load
try:
query = WebDriverWait(driver, WAIT_LONG_TIME).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))
except TimeoutException:
print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\
''.join(selector), WAIT_LONG_TIME
# Process this nested list
else:
last = path.pop(0)
if len(path) > 0: # If more to parse
print "\t" * level, "Going deeper to: %s" % ''.join(selector)
parse_crippled_shifted_list(driver, frame, selector, level + 1,
parent_id=last.id, path=path)
else: # Current is required
print "\t" * level, "Returning target category: ", ''.join(selector)
path = None
parse_crippled_shifted_list(driver, frame, selector, level + 1, last.id, path=None)
# Page case - LEAF
elif current_li.get_attribute('class') == 'leaf':
pass
else:
print "dummy"
del selector[-2:]

This How I solved the problem that I want to wait until certain amount of post where complete load through AJAX
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# create a new Chrome session
driver = webdriver.Chrome()
# navigate to your web app.
driver.get("http://my.local.web")
# get the search button
seemore_button = driver.find_element_by_id("seemoreID")
# Count the cant of post
seemore_button.click()
# Wait for 30 sec, until AJAX search load the content
WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located(By.CLASS_NAME, "post")))
# Get the list of post
listpost = driver.find_elements_by_class_name("post")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium find_element_by_id with default if not found - python

Related

Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore

How to iterate trough a list of web elements that is refreshing every 10 sec?

If Elif issue / Selenium Python

Find XPath Paragraph (following-sibling?)

Python Selenium wait for several elements to load

Categories

Resources