I made the following code to scrape some website. A list of product code is itered on research bar with Selenium. If there is no result found (if driver.find_element_by_css_selector("div[class='search-did-you-mean']"):) i just clear the research bar to make another search. If there is some results (elif driver.find_element_by_css_selector("div[class='result-search']"):) I scrape it
Here is the code :
for product in product_list:
inputElement = driver.find_element_by_id("q")
inputElement.send_keys(product[0])
inputElement.send_keys(Keys.ENTER)
inputElement.click()
time.sleep(5)
if driver.find_element_by_css_selector("div[class='search-did-you-mean']"):
time.sleep(5)
clearResearch = driver.find_element_by_id("q")
WebDriverWait(driver, 10).until_not(EC.visibility_of_element_located((By.ID, "overley")))
clearResearch.send_keys(Keys.CONTROL + "a")
clearResearch.send_keys(Keys.DELETE)
elif driver.find_element_by_css_selector("div[class='result-search']"):
time.sleep(5)
item['price'] = driver.find_element_by_css_selector("span[class='sale-price']").text
item['desc'] = driver.find_element_by_css_selector("h3[class='product-name']").text
print(item)
There is no result for the first product code of the list, so it is cleared and a new code is given. Problem appears with the second item, there is results but my elif condition seems not understand as I get an Unable to locate element: div[class='search-did-you-mean'] error.
Do you know what is wrong with my code ? Thanks a lot
This is selenium behavior will throw exception if no element found, wrap it in try-except
first_product = None
try:
first_product = driver.find_element_by_css_selector("div[class='search-did-you-mean']"
except: pass
if first_product:
.....
You can use find_elements_by_css_selector and check if the returned list has elements in it
if driver.find_elements_by_css_selector("div[class='search-did-you-mean']"):
#...
elif driver.find_elements_by_css_selector("div[class='result-search']"):
#...
Related
Trying to get an element (text) from a table using XPATH and then print it, but got the following error:
NoSuchElementException: Message: no such element: Unable to locate
element.
I have waited time but got the same error. How can I get the text?
I've used the following code:
account = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ctl00_lc_ucLeftMenu_li_1_4"]/a[2]')))
account.click()
time.sleep(3)
portfolio = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ctl00_lc_ucLeftMenu_leaf_2_35"]')))
portfolio.click()
time.sleep(3)
sold = driver.find_element_by_xpath('//*[#id="37ef7b7a-3a62-4d56-a479-29c99031de7e"]/table/tbody/tr[8]/td[5]')
print('The amount is: {}'.format(sold.text))
sold1 = float(sold.text)
Please refer to the attached file- looking to get the highlighted text.
To get that particular text, the xpath was not the correct one. Instead, I've used the following one:
//table/tbody/tr[8]/td[10]
the full code will look like this:
account = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, '//*
[#id="ctl00_lc_ucLeftMenu_li_1_4"]/a[2]')))
account.click()
time.sleep(3)
portfolio = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, '//*
[#id="ctl00_lc_ucLeftMenu_leaf_2_35"]')))
portfolio.click()
time.sleep(3)
sold = driver.find_element_by_xpath('//table/tbody/tr[8]/td[5]')
print('The amount is: {}'.format(sold.text))
sold1 = float(sold.text)
Consider the following page:
https://www.cvs.com/shop/advil-pain-reliever-fever-reducer-ibuprofen-tablets-200mg-prodid-1040240?skuid=420321
When selecting a different format, like 100 CT, a new price shows up. Copy that URL and navigate to it, notice that it redirects back to the original 10 CT page.
I want to get the price of what's on the 100 CT page.
Here is my code, which clicks the right format, but when I try to reobtain the url I still get the 10 CT page.
format_header = browser.find_element_by_css_selector("ul.--horizontalScroll.gbcvs-c-variantSelectorList")
items = format_header.find_elements_by_tag_name('li')
format_count = 1
for item in items:
text = item.text
if(text == '100 CT'):
break
else:
format_count += 1
browser.find_element_by_xpath("(.//*[normalize-space(text()) and normalize-space(.)='Count:'])[1]/following::label["+str(format_count)+"]").click()
print(browser.current_url)
browser.get(browser.current_url)
Here is the code that ran and found the price correctly.
You can get the price by using print (driver.find_element_by_css_selector("p.shoppdp-c-productPricing__actual").text)
Here is the method code for wait_until_element_not_present
def wait_until_element_not_present(locator_type,locator):
if locator_type == 'xpath':
WebDriverWait(driver, 10).until_not(EC.presence_of_element_located((By.XPATH, locator)))
elif locator_type == "css":
WebDriverWait(driver, 10).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, locator)))
#supputuri's answer is the correct one, only I replaced wait_until_element_not_present by the following
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "gbcvs-c-addToCart__inner"))
)
I'm trying to loop through a dropdown menu on at this url: https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006
So, for example, the first dropdown menu - under options - lists out different materials and I want to select each one in turn and then gather some other information from the webpage before moving on to the next material. Here is my current code:
driver = webdriver.Firefox()
driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')
time.sleep(3)
driver.find_element_by_id('x-mark-icon').click()
select = Select(driver.find_element_by_name('Wiqj7mb4rsAq9LB'))
options = select.options
optionsList = []
driver.find_elements_by_class_name('select-wrapper')[0].click()
element = driver.find_element_by_xpath("//select[#name='Wiqj7mb4rsAq9LB']")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
# driver.execute_script("arguments[0].scrollIntoView();", element)
for option in options: #iterate over the options, place attribute value in list
optionsList.append(option.get_attribute("value"))
for optionValue in optionsList:
print("starting loop on option %s" % optionValue)
# select = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//select[#name='Wiqj7mb4rsAq9LB']")))
# select = Select(select)
select.select_by_value(optionValue)
I started with just the loop, but got this error:
ElementNotInteractableException: Message: Element <option> could not be scrolled into view
I then added the webdriverwait and get a TimeoutException error.
I then realized I should probably click on the wrapper in which the dropdown is held, so I added the click, which does pup up the menu, but I still got the TimeoutException.
So I thought, maybe I should move to the element, which I tried with the action chain lines and I got this error
WebDriverException: Message: TypeError: rect is undefined
I tried to avoid that error by using this code instead:
# driver.execute_script("arguments[0].scrollIntoView();", element)
Which just resulted in the timeoutexception again.
I pretty new to Python and Selenium and have basically just been modifying code from SO answers to similar questions, but nothing has worked.
I'm using python 3.6 and the current versions of Selenium and firefox webdriver.
If anything is unclear or if you need more info just let me know.
Thanks so much!
EDIT: Based on the answer and comments by Kajal Kunda, I've updated my code to the following:
`material_dropdown = driver.find_element_by_xpath("//input[#class='select-
dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
materials=driver.find_elements_by_css_selector("div.select-wrapper
ul.dropdown-content li")
for material in materials:
# material_dropdown =
driver.find_element_by_xpath("//input[#class='select-dropdown']")
# driver.execute_script("arguments[0].click();", material_dropdown)
# materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material_ele=material.find_element_by_tag_name('span')
if material_ele.text!='':
material_ele.click()
time.sleep(5)
price = driver.find_element_by_class_name("dataPriceDisplay")
print(price.text)`
The result is that it successfully prints the price for the first type of material, but then it returns:
StaleElementReferenceException: Message: The element reference of <li class=""> is stale;...
I've tried variations of having the hashed out lines in and outside of the loop, but always get a version of the StaleElementReferenceException error.
Any suggestions?
Thanks!
You could do the whole thing with requests. Grab the drop down list from the options listed in drop down then concatenate the value attributes into requests url that retrieves json containing all the info on the page. Same principle applies for adding in other dropdown values. The ids for each drop down selection are the value attributes of the options in the drop down and appear in the url I show separated by // for each drop down selection.
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.accuform.com/product/getSku/danger-danger-authorized-personnel-only-MADM006/1/false/null//{}//WHFIw3xXmQx8zlz//6wr93DdrFo5JV//WdnO0RpwKpc4fGF'
startURL = 'https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006'
res = requests.get(startURL)
soup = bs(res.content, 'lxml')
materials = [item['value'] for item in soup.select('#Wiqj7mb4rsAq9LB option')]
sizes = [item['value'] for item in soup.select('#WvXESrTyQjM3Ciw option')]
languages = [item['value'] for item in soup.select('#WUYWGMePtpmpmhy option')]
units = [item['value'] for item in soup.select('#W91eqaJ0WPXwe9b option')]
for material in materials:
data = requests.get(url.format(material)).json()
soup = bs(data['dataMaterialBullets'], 'lxml')
lines = [item.text for item in soup.select('li')]
print(lines)
print(data['dataPriceDisplay'])
# etc......
Sample of JSON:
Try the below code.It should work.
driver = webdriver.Firefox()
driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')
time.sleep(3)
driver.find_element_by_id('x-mark-icon').click()
material_dropdown = driver.find_element_by_xpath("//input[#class='select-dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
#Code for material dropdown
materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material_optionsList = []
for material in materials:
material_ele=material.find_element_by_tag_name('span')
if material_ele.text!='':
material_optionsList.append(material_ele.text)
print(material_optionsList)
driver.execute_script("arguments[0].click();", material_dropdown)
size_dropdown = driver.find_element_by_xpath("(//input[#class='select-dropdown'])[2]")
driver.execute_script("arguments[0].click();", size_dropdown)
#Code for size dropdown
Sizes=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
size_optionsList = []
for size in Sizes:
size_ele=size.find_element_by_tag_name('span')
if size_ele.text!='':
size_optionsList.append(size_ele.text)
driver.execute_script("arguments[0].click();", size_dropdown)
Output :
[u'Adhesive Vinyl', u'Plastic', u'Adhesive Dura-Vinyl', u'Aluminum', u'Dura-Plastic\u2122', u'Aluma-Lite\u2122', u'Dura-Fiberglass\u2122', u'Accu-Shield\u2122']
Hope you will do the remaining.Let me know if it works for you.
EDIT Code for loop through and get the price value of materials.
for material in range(len(materials)):
material_ele=materials[material]
if material_ele.text!='':
#material_optionsList.append(material_ele.text)
#material_ele.click()
driver.execute_script("arguments[0].click();", material_ele)
time.sleep(2)
price = driver.find_element_by_id("priceDisplay")
print( price.text)
time.sleep(2)
material_dropdown = driver.find_element_by_xpath("//input[#class='select-dropdown']")
driver.execute_script("arguments[0].click();", material_dropdown)
materials = driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
material+=2
Output :
$8.31
$9.06
$13.22
$15.91
$15.91
I am trying to iterate through a list that refreshes every 10 sec.
this is what I have tried:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
except: # NoSuchElementException or StaleElementReferenceException
time.sleep(3) # i have tried up to 20 sec
event = events[i]
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
this did not work so I tried another except
except: # second try that also did not work
element = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
)
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
Now I am assigning something that I will never use to name like:
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
name = "blablabla"
With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage
You can get all required data using JavaScript.
Code below will give you list of events map with all details instantly and without NoSuchElementException or StaleElementReferenceException errors:
me_id : unique identificator
href : href with details which you can use to get details
team_a : name of the first team
team_a_score : score of the first team
team_b : name of the second team
team_b_score : score of the second team
event_status : status of the event
event_clock : time of the event
def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
print(event.get('me_id'))
print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
print(event.get('team_a'))
print(event.get('team_a_score'))
print(event.get('team_b'))
print(event.get('team_b_score'))
print(event.get('event_status'))
print(event.get('event_clock'))
One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects. When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects.
One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. I personally feel the cleanest approach is to use the CSS :nth-child() approach:
from selenium import webdriver
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
# Get a list of all elements
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
# Iterate through the list, keeping track of the index
# note that nth-child referencing begins at index 1, not 0
for index, _ in enumerate(events, 1):
name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
))
print(name.text)
finally:
driver.quit()
if __name__ == "__main__":
main()
If I run the above script, I get this output:
$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod
Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. To overcome that you can use a retry decorator (pip install retry to get the package) to handle the SERE and reacquire the element:
import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
#retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
elem = driver.find_element_by_css_selector(selector)
return elem.text
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
for index, _ in enumerate(events, 1):
name = get_name(
driver,
"{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
)
)
print(name)
finally:
driver.quit()
if __name__ == "__main__":
main()
Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. I can't help with that without a better description of what you are actually trying to accomplish with this script.
I am trying to refresg page until item appears but my code doesn't work (I took pattern on that: python selenium keep refreshing until item found (Chromedriver)).
Here is the code:
while True:
try:
for h1 in driver.find_elements_by_class_name("name-link"):
text = h1.text.replace('\uFEFF', "")
if "Puffy" in text:
break
except NoSuchElementException:
driver.refresh
else:
for h1 in driver.find_elements_by_class_name("name-link"):
text = h1.text.replace('\uFEFF', "")
if "Puffy" in text:
h1.click()
break
break
These fragment is because I have to find one item with the same class name and replace BOM with "" (find_element_by_partial_link_text didn't work).
for h1 in driver.find_elements_by_class_name("name-link"):
text = h1.text.replace('\uFEFF', "")
if "Puffy" in text:
break
Could someone help me? Thanks a lot.
You're trying to get list of elements (driver.find_elements_by_class_name() might return list of elements or empty list - no exceptions) - you cannot get NoSuchElementException in this case, so driver.refresh will not be executed. Try below instead
while True:
if any(["Puffy" in h1.text.replace('\uFEFF', "") for h1 in driver.find_elements_by_class_name("name-link")]):
break
else:
driver.refresh
driver.find_element_by_xpath("//*[contains(., 'Puffy')]").click()