I am scraping a webpage using selenium. I first find the link I want and then click on it and download it(Link is pdf). What happens is sometimes I am able to do so, but sometimes selenium says that link not found. I suppose that it is due to the page not loading properly. What can I do about this and am I in the right direction?
This is my previous code:
for b in source_code_2.find_all('a', href=True):
if b.has_attr("title"):
if(b['title']=='Click here to download'):
urllib2.urlretrieve(full_url)
now i want to do it using selenium and element. How can I do this?
I think you should use explicit wait to tell selenium to wait until specific element loads properly , In python you can use explicit wait in following way :
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.ID, "yourElement"))
OR
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.ID, "yourElement"))
element.click()
You just need to replace your element ID in above code and you can change 20 seconds to 30 ,40 as per your need. So meaning of above code is your webdriver will wait until 20 seconds to find that specific element.
Related
does anyone know how I can get past this pop-up using selenium? When I log into Facebook regularly it doesn't come up but for some reason unknown to me, it keeps firing up when I run my script.
cookie_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='u_0_j_I5']"))).click()
This is the script I'm trying to use to get rid of it but it isn't working
The css selector that you are using:
button[id='u_0_j_I5']
looks dynamic and brittle in nature. What that means is, every time you refresh a page you would see a different id generated from the backend.
I would suggest you use a locator which should be reliable and not too easy to break.
A CSS:
input[data-cookiebanner='accept_button']
or XPath
//input[#data-cookiebanner='accept_button']
but you should be mindful of the uniqueness of the locator.
Please check in the dev tools (Google chrome) if we have unique entry in HTML DOM or not.
Steps to check:
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath/css and see, if your desired element is getting highlighted with 1/1 matching node.
If they are unique, then you can use the below code:
cookie_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[data-cookiebanner='accept_button']")))
cookie_button.click()
Update:
Use the below XPath:
//button[contains(text(),'Allow All Cookies')]
and click it like this:
cookie_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Allow All Cookies')]")))
cookie_button.click()
I'm trying to use Selenium to scrape google maps unfortunatly its's not quite working,the element is not present on page load and is added after clicking on some button but it seems that the element is not always loaded when looking for it. (I'm talking about the carousel's items that appears after clicking on a shop,restaurant while doing a specific search)
I already tried to use the classic Selenium wait options.
Things I tried:
time.sleep()
WebDriverWait(driver,30).until(EC.visibility_of_element_located(...)
WebDriverWait(driver,30).until(EC.element_to_be_clickable(...)
WebDriverWait(driver,60).until(EC.presence_of_all_elements_located(...)
Even with theses things results are random, I sometimes can access the element via Selenium and sometimes I don't.
It seem's that even with long waits, Selenium can't always access the web element even tho I can click it and see it.
You can make the web driver wait until the element is clickable. Have you tried that?
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.ID, "myElement")))
or a better way of doing this as pointed out here
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH, "myXpath")))
I'm trying to find all the my subject in my dashboard of my college website.
I'm using selenium to do it.
The site is a little slow so first I wait
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class='multiline']")))
then I find all the elements with
course = driver.find_elements_by_xpath("//span[#class='multiline']")
after that in a for loop I try to traverse it the 0th place of the "course" works fine and I'm able to click it and go to webpage but when the loop runs for the secon d time that is for the 1st place in "course" it gives me error selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
So I tried adding a lit bit wait time to using 2 method it still gives me error
driver.implicitly_wait(20)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class='multiline']")))
the loop
for i in course[1::]:
#driver.implicitly_wait(20)
#WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class='multiline']")))
print(i)
i.click()
driver.implicitly_wait(2)
driver.back()
a snippet of the website
Thanks in advance
Answering my own question after extensive research
A common technique used for simulating a tabbed UI in a web app is to prepare DIVs for each tab, but only attach
one at a time, storing the rest in variables. In this case my code have a reference
to an element that is no longer attached to the DOM (that is, that has an ancestor which is "document.documentElement").
If WebDriver throws a stale element exception in this case, even though the element still exists, the reference
is lost. You should discard the current reference you hold and replace it, possibly by locating the element again
once it is attached to the DOM
for i in range(len(course)):
# here you need to find all the elements again because once we
leave the page the reference will be lost and we need to find it again
course = driver.find_elements_by_xpath("//span[#class='multiline']")
print(course[i].text)
course[i].click()
driver.implicitly_wait(2)
driver.back()
I have been trying to find a button a click on it but no matter what I try it has been unable to locate it. I have tried using all the driver.find_element_by... methods but nothing seems to be working
from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path="/Users/shreygupta/Documents/ComputerScience/PythonLanguage/Automation/corona/chromedriver")
driver.get("https://ourworldindata.org/coronavirus")
driver.maximize_window()
time.sleep(5)
driver.find_element_by_css_selector("a[data-track-note='chart-click-data']").click()
I am trying to click the DATA tab on the screenshot below
You can modify your script to open this graph directly:
driver.get("https://ourworldindata.org/grapher/total-cases-covid-19")
driver.maximize_window()
Then you can add implicitly_wait instead of sleep. An implicit wait tells WebDriver to poll the DOM for a certain amount of time when trying to find any element (or elements) not immediately available (from python documentation). It'll work way faster because it'll interact with an element as soon as it finds it.
driver.implicitly_wait(5)
driver.find_element_by_css_selector("a[data-track-note='chart-click-data']").click()
Hope this helps, good luck.
Here is the logic that you can use, where the script will wait for max 30 for the Data menu item and if the element is present with in 30 seconds it will click on the element.
url = "https://ourworldindata.org/grapher/covid-confirmed-cases-since-100th-case"
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver,30)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[data-track-note='chart-click-data']"))).click()
I desire to iterate thru a set of URLs using Selenium. From time to time I get 'element is not attached to the page document'. Thus after reading a couple of other questions indicated that it's because I am changing the page that is looking at. But I am not satisfied with that argument since:
for url in urlList:
driver.get(url)
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, '//div/div')))
#^ WebDriverWait shall had taken care of it
myString = driver.find_element_by_xpath('//div/div').get_attribute("innerHTML")
# ^ Error occurs here
# Then I call this function to go thru other elements given other conditions not shown
if myString:
getMoreElements(driver)
But if I add a delay like this:
for url in urlList:
driver.get(url)
time.sleep(5) # <<< IT WORKS, BUT WHY?
element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, '//div/div')))
myString = driver.find_element_by_xpath('//div/div').get_attribute("innerHTML") # Error occured here
I feel I am hiding the problem by adding the delay right there. I have implicity_wait set to 30s and set_page_load_timeout to 90s, that would had been sufficient. So, why am I still facing to add what looks like useless time.sleep?
Did you try the xpath: //div/div manually in dev tool to see how many div will be found on the page? I thinks there should be many. So your below explicity wait code can very easy to satisfied, maybe no more than 1 second, selenium can find such one div after browser.get() and your wait end.
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, '//div/div')))
Consider following possiblity:
Due to your above explicity wait issue, the page loading not complete, more and more //div/div are rendering to page, at this time point, you ask selenium to find such one div and to interact with it.
Think about the possiblity of the first found div by selenium won't be deleted or moved to another DOM node.
What do you think the rate of above possiblity will be high or low? I think it's very hight, because div is very common tag in nowdays web page and you use such a relaxed xpath which lead to so many matched div will be found, and each one of them is possible to cause the 'Element Stale' issue
To resolve your issue, please use more strict locator to wait some special element, rather than such hasty xpath which result in finding very common and many exist element.
What you observe as element is not attached to the page document is pretty much possible.
Analysis:
In your code, while iterating over the urlList, we are opening an url then waiting for the WebElement with XPATH as //div/div with ExpectedConditions clause set to presence_of_element_located which does not necessarily mean that the element is visible or clickable.
Hence, next when you try to driver.find_element_by_xpath('//div/div').get_attribute("innerHTML") the reference of previous search/find_element is not found.
Solution:
The solution to your question would be to change the ExpectedConditions clause from presence_of_element_located to element_to_be_clickable which checks that element is visible and enabled such that you can even click it.
Code Block:
Your optimized code block may look like:
for url in urlList:
driver.get(url)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, '//div/div')))
myString = driver.find_element_by_xpath('//div/div').get_attribute("innerHTML")
Your other solution:
Your other solution works because you are trying to covering up Selenium's work through time.sleep(5) which is not a part of best practices.