Actually this is my second question on the same subject.
But in the original question, I put so many functions in my code which played a role as distraction.
So in this post, I deleted all the unnecessary functions and focused on my problem.
What I want to do is as follwing:
1. open a url with the firefox browser(using selenium)
2. click into a page
3. click every thumnails using a loop until the loop hit the "NoSuchElementExcepts" error
4. stop the loop when the loop hit the error
Here is my code.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#1. opening Firefox and go to an URL
driver = webdriver.Firefox()
url = "https://www.amazon.com/Kraft-Original-Macaroni-Microwaveable-Packets/dp/B005ECO3H0"
driver.get(url)
action = ActionChains(driver)
time.sleep(5)
#2. going to the main images page
driver.find_element_by_css_selector('#landingImage').click()
time.sleep(2)
#3 parsing the imgs and click them
n = 0
for i in range(1,10):
try:
driver.find_element_by_css_selector(f'#ivImage_{n}').click()
element = WebDriverWait(driver,20,1).until(
EC.presence_of_element_located((By.CLASS_NAME, "fullscreen"))
)
n = n + 1
except NoSuchElementException:
break
driver.close()
and error stacktrace is like this
Exception has occurred: NoSuchElementException
Message: Unable to locate element: #ivImage_6
File "C:\Users\Administrator\Desktop\pythonworkspace\except_test.py", line 25, in <module>
driver.find_element_by_css_selector(f'#ivImage_{n}').click()
FYI, all the thumnail images are under [ivImage_"number"] ID
I don't know why my try-except statement is not working.
Am I missing something?
Related
I am going through a site and the site has a Load More button which I need to click on until it no more appears on the site, I have written the below code but not sure if there is a better way to handle than a WHILE loop. Are there other selenium methods to handle this?
driver.get(url)
while driver.find_element_by_xpath("//xpath").is_displayed():
try:
loadmore = driver.find_element_by_xpath("//xpath")
loadmore.click()
except Exception as e:
break
this works but I get 'NoneType' object has no attribute 'is_displayed' after its done with all the clicking, I wrote another code
while True:
try:
loadmore = driver.find_element_by_xpath("//xpath")
loadmore.click()
except Exception as e:
break
this works without errors as the exception is caught since I dont use the is_displayed method.
You can use waits with expected_conditions, i.e.:
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
driver.get('theurl')
wait = WebDriverWait(driver, 10) # increase the timeout as needed
el = wait.until(ec.visibility_of_element_located((By.XPATH, "//xpath")))
el.click()
I've written some code in python in combination with selenium to parse the different questions from quora.com. My scraper is doing it's job at this moment. The thing is I've used here hardcoded delay for the scraper to work, even when Explicit Wait has already been defined. As the page is an infinite scrolling one, i tried to make the scrolling process to a limited number. Now, I have got two questions:
Why wait.until(EC.staleness_of(page)) is not working within my scraper. It is commented out now.
If i use something else instead of page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link"))) the scraper throws an error: can't focus element.
Btw, I do not wish to go for page = driver.find_element_by_tag_name('body') this option.
Here is what I've written so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
for scroll in range(10):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
# wait.until(EC.staleness_of(page))
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
You can try below code to get as much XHR as possible and then parse the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
links_counter = len(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "question_link"))))
while True:
page.send_keys(Keys.END)
try:
wait.until(lambda driver: len(driver.find_elements_by_class_name("question_link")) > links_counter)
links_counter = len(driver.find_elements_by_class_name("question_link"))
except TimeoutException:
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
Here we scroll page down and wait up to 10 seconds for more links to be loaded or break the while loop if the number of links remains the same
As for your questions:
wait.until(EC.staleness_of(page)) is not working because when you scroll page down you don't get the new DOM - you just make XHR which adds more links into existed DOM, so the first link (page) will not be stale in this case
(I'm not quite confident about this, but...) I guess you can send keys only to nodes that can be focused (user can set focus manually), e.g. links, input fields, textareas, buttons..., but not content division (div), paragraphs (p), etc
On the below mentioned website, When I select date as 27 jun-2017 and Series/Run rates as "USD RATES 1100". After submitting it, rates opens below on that page. Till this point I am able to do it programitically. But I need 10 year rate(answer is 2.17) of above mentioned date and rate combination. Can some one please tell me what error I am making in the last line of the code.
https://www.theice.com/marketdata/reports/180
from selenium import webdriver
chrome_path = r"C:\Users\vick\Desktop\python_1\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.theice.com/marketdata/reports/180")
try:
driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/div/
div[2]/button').click()
except:
pass
driver.find_element_by_xpath('//*
[#id="seriesNameAndRunCode_chosen"]/a/span').click()
driver.find_element_by_xpath('//*
[#id="seriesNameAndRunCode_chosen"]/div/ul/li[5]').click()
driver.find_element_by_xpath('//*[#id="reportDate"]').clear()
driver.find_element_by_xpath('//*[#id="reportDate"]').send_keys("27-Jul-
2017")
driver.find_element_by_xpath('//*[#id="selectForm"]/input').click()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)/2;")
print(driver.find_element_by_xpath('//*[#id="report-
content"]/div/div/table/tbody/tr[10]/td[2]').get_attribute('innerHTML'))
Error I am getting in last line:
NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="report-content"]/div/div/table/tbody/tr[10]/td[2]"}
Thankyou for the help
You have to wait a second or two when you click the input field. Like:
from selenium import webdriver
chrome_path = r"C:\Users\vick\Desktop\python_1\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.theice.com/marketdata/reports/180")
try:
driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/div/div[2]/button').click()
except:
pass
driver.find_element_by_xpath('//*[#id="seriesNameAndRunCode_chosen"]/a/span').click()
driver.find_element_by_xpath('//*[#id="seriesNameAndRunCode_chosen"]/div/ul/li[5]').click()
driver.find_element_by_xpath('//*[#id="reportDate"]').clear()
driver.find_element_by_xpath('//*[#id="reportDate"]').send_keys("27-Jul-2017")
driver.find_element_by_xpath('//*[#id="selectForm"]/input').click()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)/2;")
time.sleep(2) #here is the part where you should wait.
print(driver.find_element_by_xpath('//*[#id="report-content"]/div/div/table/tbody/tr[10]/td[2]').get_attribute('innerHTML'))
Option B is to wait until the element has been loaded:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
....
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)/2;")
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'report-content'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
......
print(driver.find_element_by_xpath('//*[#id="report-content"]/div/div/table/tbody/tr[10]/td[2]').get_attribute('innerHTML'))
In the first case Python waits 2 seconds and than continues.
In the second case the Webdriver waits until the element is loaded (for maximal 5 seconds)
Tried the code and it works. Hope that helped.
I am writing a scraping code for the website Upwork, and need to click through each page for job listings. Here is my python code, which I used selenium to web crawl.
from bs4 import BeautifulSoup
import requests
from os.path import basename
from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
driver = webdriver.Chrome("./chromedriver")
driver.get("https://www.upwork.com/o/jobs/browse/c/design-creative/")
link = driver.find_element_by_link_text("Next")
while EC.elementToBeClickable(By.linkText("Next")):
wait.until(EC.element_to_be_clickable((By.linkText, "Next")))
link.click()
There are couple of problems:
EC has no attribute elementToBeClickable. In Python you should use element_to_be_clickable
Your link defined on the first page only, so using it on the second page should give you StaleElementReferenceException
There is no wait variable defined in your code. I guess you mean something like
wait = WebDriverWait(driver, 10)
By has no attribute linkText. Try LINK_TEXT instead
Try to use below code to get required behavior
from selenium.common.exceptions import TimeoutException
while True:
try:
wait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, Next"))).click()
except TimeoutException:
break
This should allow you to click Next button while it's available
I'm trying to make Selenium wait for a specific element (near the bottom of the page) since I have to wait until the page is fully loaded.
I'm confused by it's behavior.
I'm not an expert in Selenium but I expect this work:
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 10)
def load_page():
driver.get('http://www.firmy.cz/?geo=0&q=hodinov%C3%BD+man%C5%BEel&thru=sug')
wait.until(EC.visibility_of_element_located((By.PARTIAL_LINK_TEXT, 'Zobrazujeme')))
html = driver.page_source
print html
load_page()
TIMEOUT:
File "C:\Python27\lib\site-packages\selenium\webdriver\support\wait.py", line 78, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I'm just trying to see the HTML of the fully loaded page. It raises TimeoutException but I'm sure that that element is already there. I've already tried another approach.
wait.until(EC.visibility_of_element_located(driver.find_element_by_xpath('//a[#class="companyTitle"]')))
But this approach raises error too:
selenium.common.exceptions.NoSuchElementException:
Message: Unable to locate element:
{"method":"xpath","selector":"//a[#class=\"companyTitle\"]"}
Loading the site takes a long time, use implicitly waiting.
In this case, when you are interested in the whole HTML, you don't have to wait for a specific element at the bottom of the page.
The load_page function will print the HTML as soon as the whole site is loaded if you give the browser enough time to do this with implicitly_wait().
from selenium import webdriver
driver = webdriver.Firefox()
# wait max 30 seconds till any element is located
# or the site is loaded
driver.implicitly_wait(30)
def load_page():
driver.get('http://www.firmy.cz/?geo=0&q=hodinov%C3%BD+man%C5%BEel&thru=sug')
html = driver.page_source
print html
load_page()
The main issue in your code is wrong selectors.
If you want to wait till web element with text Zobrazujeme will loaded and then print page source:
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 10)
def load_page():
driver.get('http://www.firmy.cz/?geo=0&q=hodinov%C3%BD+man%C5%BEel&thru=sug')
wait.until(EC.visibility_of_element_located((By.CLASS_NAME , 'switchInfoExt')))
html = driver.page_source
print html
load_page()