I am pretty new to web-scraping...
For example here is the part of my code:
labels = driver.find_elements(By.CLASS_NAME, 'form__item-checkbox-label.placeFinder-search__checkbox-label')
checkboxes = driver. find_elements(By.CLASS_NAME, 'form__item-checkbox-input.placeFinder-search__checkbox-input')
boxes = zip(labels,checkboxes)
time.sleep(3)
for label,checkbox in boxes:
if checkbox.is_selected():
label.click()
Here is another example:
driver.get(product_link)
time.sleep(3)
button = driver.find_element(By.XPATH, '//*[#id="tab-panel__tab--product-pos-search"]/h2')
time.sleep(3)
button.click()
And I am scraping through let's say hundreds of products. 90% of the time it works fina, but occasionally giver errors like couldn't locate the element or something is not clickable etc. But all these products pages are built the same. Moreover, if I just re-run code on the product that resulted in the error, mosr of the time from the 2nd or 3rd time I will be able to scrape the data and will not get the error back.
Why does it happen? Code stays the same, web page stays the same.. What is causing an error when it happens? The only thing that comes to my mind the Internet connection sometimes gets behind the code and the program is unable to see the elenebts it is looking for... But as you can see I have added time.sleep() but it does not always help...
How can this be avoided? It is really annoying to be forced to stay in front of the monitor all the day just to supervise and re-run the code.... I mean I guess I could just add the scrape fubction inside the try: except: else: block but I am still wondering why does the same code will sometimes work and sometimes return the error on the same page?
In short Selenium deals with three distinct states of a WebElement.
presence
visibile
interactable / clickable
Ideally, to click on any clickable element you need to induce WebDriverWait for the element_to_be_clickable() as follows:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[#id="tab-panel__tab--product-pos-search"]/h2"))).click()
Similarly you can also create a list of desired elements waiting for their visibility and click on them one by one waiting for each of them to be clickable as follows:
checkboxes = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "form__item-checkbox-input.placeFinder-search__checkbox-input")))
for checkbox in checkboxes:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((checkbox))).click()
Welcome to the "dirty" side of Web automation. We called it "Flaky" tests. In other word they are "fragile". And the major disadvantage of Selenium Webdriver.
There could be several reasons of flaky situation:
Network instability: Since all commands sent over network: client ->
(selenium grid: in case need) -> browser driver -> actual browser. Any connection issue may cause reason to failed.
CSS animations: Since it executes commands directly, if you have some animative transitions, it may cause to fail
Ajax similar requests or dynamic element changing. If you have such "load more" or displaying after some actions, It may not dedect or still overlapping
And, last comment is sleep is not good idea to use, actually it is againts to best practices. Instead of, use Expected Conditions to ensure elements are visible and ready
Related
While I was trying to create a bot to automate shopping on a certain page, I run into a problem that I couldn't fix for a long time. The goal of program was to enter the page, click on the certain button representing size and click on the buy button to add item to a cart. It was running in a loop for every link to the item that I've supplied. The code of the program:
def buy(driver: webdriver, href: str):
driver.get(href)
sizeButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, f"//span[contains(text(),'{size.upper()}')]/../..")))
sizeButton.click()
buyButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='addToCartButton']/button")))
buyButton.click()
The code worked on the first iteration, but after adding first item to a cart and switching to next page, driver couldn't find the same WebElements. I made sure that XPaths didn't change, new page was on the same window and there were no any extra iframes. To addition to that, when the code didn't include one of the "clicks" on either button, code worked fine.
After trying many possible fixes, accidentally I run into solution by doing both explicit waits first and then forcing click methods.
def buy(driver: webdriver, href: str):
driver.get(href)
sizeButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, f"//span[contains(text(),'{size.upper()}')]/../..")))
buyButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='addToCartButton']/button")))
sizeButton.click()
buyButton.click()
Is there anyone who can explain me why earlier approach didn't work? I had lost a lot of time fixing it, so I would love to gain new knowledge to avoid mistakes in the future.
I'm creating an Instagram Unfollow Tool. The code usually works, but occasionally Instagram will show the information of some user (as if I hovered over that user) and cause my code to stop running because it obscures the buttons that I need to click in order to unfollow a user.
It's easier to understand with an example:
How can I edit my code to make the mouseover information go away (is there a way to turn off these mouseover effects)?
My code:
for i in range(num_following):
following = self.browser.find_element_by_xpath("/html/body/div[5]/div/div/div[2]/ul/div/li[{}]/div/div[1]/div[2]/div[1]/span/a".format(i))
following_username = following.get_attribute("title")
if following_username not in followers:
# Unfollow account
following_user_button = self.browser.find_element_by_xpath("/html/body/div[5]/div/div/div[2]/ul/div/li[{}]/div/div[2]/button".format(i))
following_user_button.click()
unfollow_user_button = self.browser.find_element_by_xpath("/html/body/div[6]/div/div/div/div[3]/button[1]")
unfollow_user_button.click()
print("You've unfollowed {}.".format(following_username))
The error I get:
selenium.common.exceptions.ElementClickInterceptedException: Message: Element is not clickable at point (781,461) because another element obscures it
It seems like the element unfollow_user_button = self.browser.find_element_by_xpath("/html/body/div[6]/div/div/div/div[3]/button[1]") you want to execute .click() on is blocked by a temporary or permanent overlay.
In such cases you either can wait with ExplicitWait and ExplicitConditions till said blocking element has vanished - though this shouldn't work in this specific case as to my knowledge the popup remains if nothing is done. Another approach is to send the click directly to the element by using the JavascriptExecutor:
#Find the element - by_xpath or alike
unfollow_user_button = driver.find_element_by_xpath("XPATH")
#Sending the click via JavascriptExecutor
driver.execute_script("arguments[0].click();", unfollow_user_button)
Note two things:
driver must obviously be an instance of WebDriver.
I would suggest not using the absolute XPath in general. Going with the relative XPath is less prone to be broken by small changes in the site structure. Click here for a small guide to read through.
I am trying to make a scraper that will go through a bunch of links, export the guide as a PDF, and loop through all the guides that are in the parent folder. It works fine going in, ,but when I try to go backwards, it throws stale exceptions, even when I make sure to refresh the elements in the code, or refresh the page.
from selenium import webdriver
import time, bs4
browser = webdriver.Firefox()
browser.get('MYURL')
loginElem = browser.find_element_by_id('email')
loginElem.send_keys('LOGIN')
pwdElem = browser.find_element_by_id('password')
pwdElem.send_keys('PASSWORD')
pwdElem.submit()
time.sleep(3)
category = browser.find_elements_by_class_name('title')
for i in category:
i.click()
time.sleep(3)
guide = browser.find_elements_by_class_name('cell')
for j in guide:
j.click()
time.sleep(3)
soup = bs4.BeautifulSoup(browser.page_source, features="html.parser")
guidetitle = soup.find_all(id='guide-intro-title')
print(guidetitle)
browser.find_element_by_link_text('Options').click()
time.sleep(0.5)
browser.find_element_by_partial_link_text('Download PDF').click()
browser.find_element_by_id('download').click()
browser.execute_script("window.history.go(-2)")
print("went back")
time.sleep(5)
print("waited")
guide = browser.find_elements_by_class_name('thumb')
print("refreshed elements")
print("made it to outer loop")
This happens if I both use a script to move the browser back, or the driver.back() method. I can see that it makes it back to the child directory, then waits, and refreshes the elements. But, then it can't seem to load the new element to go into the next guide. I found a similar questions here on SO but someone just provided code tailored to the problem instead of explaining so I am still confused.
I also know about using waitdriver but I am just using sleep now since I don't fully understand the EC wait conditions. In any case, increasing the sleep time doesn't fix this issue.
Stale Element Reference Exception occurs upon page refresh because of an element UUID change in the DOM.
How to avoid it: Always try to search for an element right before interaction.
In your code, you searched for cells, found them and stored them in guide. So now, guide has a list of selenium UUIDs. But then, you are making a loop to go through the list, and upon each refresh (that happens when you do back I believe), cell's UUID changes, so old ones that you have stored are no longer attached to the DOM. When trying to interact with them, Selenium cannot find them in the DOM and throws this exception.
Instead of looping through guide your way, try re-find element every time, like:
guide = browser.find_elements_by_class_name('cell')
for j in range(len(guide)):
browser.find_elements_by_class_name('cell')[j].click()
Note, it looks like category might have a similar problem, so try applying this solution to category as well.
Hope this helps. Here is a similar issue and a solution.
I am trying to understand Python in general as I just switched over from using VBA. I interested in the possible ways you could approach this single issue. I already went around it by just going to the link directly, but I need to understand and apply here.
from selenium import webdriver
chromedriver = r'C:\Users\dd\Desktop\chromedriver.exe'
browser = webdriver.Chrome(chromedriver)
url = 'https://www.fake.com/'
browser.get(url)
browser.find_element_by_id('txtLoginUserName').send_keys("Hello")
browser.find_element_by_id('txtLoginPassword').send_keys("There")
browser.find_element_by_id('btnLogin').click()
At this point, I am trying to navigate to a particular button/link.
Here is the info from the page/element
T-Mobile
Here are some of the things I tried:
for elem in browser.find_elements_by_xpath("//*[contains(text(), 'T-Mobile')]"):
elem.click
browser.execute_script("InitiateCallBack(187, True, T-Mobile, https://www.fake.com/, TMobile)")
I also attempted to look for tags and use css selector all of which I deleted out of frustration!
Specific questions
How do I utilize the innertext,"T-Mobile", to click the button?
How would I execute the onclick event?
I've tried to read the following links, but still have not succeeded incoming up with a different way. Part of it is probably because I don't understand the specific syntax yet. This is just some of the things I looked at. I spent about 3 hours trying various things before I came here!
selenium python onclick() gives StaleElementReferenceException
http://selenium-python.readthedocs.io/locating-elements.html
Python: Selenium to simulate onclick
https://stackoverflow.com/questions/43531654/simulate-a-onclick-with-selenium-https://stackoverflow.com/questions/45360707/python-selenium-using-onclick
Running javascript in Selenium using Python
How do I utilize the innertext,"T-Mobile", to click the button?
find_elements_by_link_text would be appropriate for this case.
elements = driver.find_elements_by_link_text('T-Mobile')
for elem in elements:
elem.click()
There's also a by_partial_link_text locator as well if you don't have the full exact text.
How would I execute the onclick event?
The simplest way would be to simply call .click() on the element as shown above and the event should, naturally, execute at that time.
Alternatively, you can retrieve the onclick attribute and use driver.execute_script to run the js.
for elem in elements:
script = elem.get_attribute('onlcick')
driver.execute_script(script)
Edit:
note that in your code you did element.click -- this does nothing. element.click() (note the parens) calls the click method.
is there a way to utilize browser.execute_script() for the onclick event
execute_script can fire the equivalent event, but there may be more listeners that you miss by doing this. Using the element click method is the most sound. There may very well be many implementation details of the site that may hinder your automation efforts, but those possibilities are endless. Without seeing the actual context, it's hard to say.
You can use JS methods to click an element or otherwise interact with the page, but you may miss certain event listeners that occur when using the site 'normally'; you want to emulate, more or less, the normal use as closely as possible.
As per the HTML you have shared it's pretty clear the website uses JavaScript. So to click() on the link with text as T-Mobile you have to induce WebDriverWait with expected_conditions clause as element_to_be_clickable and your can use the following code block :
WebDriverWait(driver, 20).until(expected_conditions.element_to_be_clickable((By.XPATH, "//a[contains(.,'T-Mobile')]"))).click()
you can use it
<div class="button c_button s_button" onclick="submitForm('rMTF')" style="margin-bottom: 30px;">
<input class="v_small" type="button"></input>
<span>
Reset
</span>
I'm using selenium webdriver with python so as to find an element and click it. This is the code. I'm passing 'number' to this code's method and this doesn't work. I see it on the browser that the element is found but it doesn't click the element.
subIDTypeIcon = "//a[#id='s_%s_IdType']/img" % str(number)
self.driver.find_element_by_xpath(subIDTypeIcon).click()
Whereas, I tried placing the 'self.driver.find_.....' twice and to my surprise it works
subIDTypeIcon = "//a[#id='s_%s_IdType']/img" % str(number)
self.driver.find_element_by_xpath(subIDTypeIcon).click()
self.driver.find_element_by_xpath(subIDTypeIcon).click()
I have the browser getting opened on a remote server so there is sometimes timeout problem.
Is there a proper way to make this work? why does it work when same statement is placed twice
This is a common problem and the main reason to create abstract, per page helper classes. Instead of blindly finding elements, you usually need a loop which tries to find an element for a couple of seconds so the browser can update the DOM.
The second version often works because starting to load a new page doesn't invalidate the DOM. That only happens when the remote server has started to send enough of the new document to the browser. You can see this yourself when you use a browser: Pages don't become blank the same instant you click on a link. Instead, they stay for a while.