https://squidindustries.co/checkout
checkout_cc_number = driver.find_element_by_id("number")
checkout_cc_number.send_keys(card_number)
When I try to input information into the card number field I get an error saying the element could not be located. I tried using time.sleep and driver.implicitly_wait when i first got to the page but both failed. Any ideas?
The element is in a frame (i.e. a webpage within a webpage). Selenium will look for elements in the page it has loaded and not within frames. That's the problem.
To solve this we just need a bit more code, which will tell Selenium to look in the frame.
The example you've given is several pages deep into a shopping cart, so I'm going to use a much more accessible example instead: the mozilla guide to iframes.
Here is some code to open that page and then click the CSS button within the frame:
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get(r"https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
time.sleep(5)
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
css_button = browser.find_element_by_id("css")
css_button.click()
browser.switch_to.default_content()
There are two lines that are important. The first one is:
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
That finds the frame and then switches to it. Once we have done that, any code that looks for elements will be looking in the frame and not in the page that we navigated to. That is what you need to do to access the number element. In your example the class of the frame is card-fields-iframe, so use that instead of interactive.
The second important line is:
browser.switch_to.default_content()
That reverts the previous line. So now Selenium will be looking for elements within the page that we navigated to. You'll want to do that after interacting with the frame, so that you can continue through the shopping cart.
have you tried getting the input element using the DOM? what happens if you do document.getElementById('number') ?
I ran into the same issue, and with checkouts, as you mentioned, all the iframe class names are the same. What I did was get all the iframes with the same class name as a list:
iframes = driver.find_elements(By.CLASS_NAME, "card-fields-iframe")
I then switched through the iframes referencing each one by its place in the list. Since there are only four fields in the checkout, the list is only 4 elements long, starting with [0].
driver.switch_to.frame(iframes[0])
number = driver.find_element(By.ID, "number")
if number.is_displayed:
number.send_keys("4000300040005000")
driver.switch_to.default_content()
It's important to note that switching back to the default content, using driver.switch_to.default_content(), before switching to the next frame is the only way I was able to make this work. The is_displayed function just checks to see whether the element is on the page or not.
Related
I am pretty new to web-scraping...
For example here is the part of my code:
labels = driver.find_elements(By.CLASS_NAME, 'form__item-checkbox-label.placeFinder-search__checkbox-label')
checkboxes = driver. find_elements(By.CLASS_NAME, 'form__item-checkbox-input.placeFinder-search__checkbox-input')
boxes = zip(labels,checkboxes)
time.sleep(3)
for label,checkbox in boxes:
if checkbox.is_selected():
label.click()
Here is another example:
driver.get(product_link)
time.sleep(3)
button = driver.find_element(By.XPATH, '//*[#id="tab-panel__tab--product-pos-search"]/h2')
time.sleep(3)
button.click()
And I am scraping through let's say hundreds of products. 90% of the time it works fina, but occasionally giver errors like couldn't locate the element or something is not clickable etc. But all these products pages are built the same. Moreover, if I just re-run code on the product that resulted in the error, mosr of the time from the 2nd or 3rd time I will be able to scrape the data and will not get the error back.
Why does it happen? Code stays the same, web page stays the same.. What is causing an error when it happens? The only thing that comes to my mind the Internet connection sometimes gets behind the code and the program is unable to see the elenebts it is looking for... But as you can see I have added time.sleep() but it does not always help...
How can this be avoided? It is really annoying to be forced to stay in front of the monitor all the day just to supervise and re-run the code.... I mean I guess I could just add the scrape fubction inside the try: except: else: block but I am still wondering why does the same code will sometimes work and sometimes return the error on the same page?
In short Selenium deals with three distinct states of a WebElement.
presence
visibile
interactable / clickable
Ideally, to click on any clickable element you need to induce WebDriverWait for the element_to_be_clickable() as follows:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[#id="tab-panel__tab--product-pos-search"]/h2"))).click()
Similarly you can also create a list of desired elements waiting for their visibility and click on them one by one waiting for each of them to be clickable as follows:
checkboxes = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "form__item-checkbox-input.placeFinder-search__checkbox-input")))
for checkbox in checkboxes:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((checkbox))).click()
Welcome to the "dirty" side of Web automation. We called it "Flaky" tests. In other word they are "fragile". And the major disadvantage of Selenium Webdriver.
There could be several reasons of flaky situation:
Network instability: Since all commands sent over network: client ->
(selenium grid: in case need) -> browser driver -> actual browser. Any connection issue may cause reason to failed.
CSS animations: Since it executes commands directly, if you have some animative transitions, it may cause to fail
Ajax similar requests or dynamic element changing. If you have such "load more" or displaying after some actions, It may not dedect or still overlapping
And, last comment is sleep is not good idea to use, actually it is againts to best practices. Instead of, use Expected Conditions to ensure elements are visible and ready
I'm an beginner learning web scraping with Selenium. Recently I faced the problem that sometimes there are button elements that do not have a "href" attribute with link to the website it leads to. In order to obtain the link or useful information from that link, I need to click on the button and get the current url in the new window using the "current_url" method. However, it doesn't always work, when the new url is not valid. I'm asking for help on the solution.
To give you an example, say one wants to obtain the Spotify link to the song listed on https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712. After clicking on the Spotify button, instead of being directed to spotify web player, I see a new window popping up with this url "spotify:track:6ta5yavnnEfCE4faU0jebM". It's not valid probably due to some errors made by the website, but the identifier "6ta5yavnnEfCE4faU0jebM" is still useful so I want to obtain it.
However, when I try using the "current_url" method, it gives me the original link "https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712", instead of the invalid url. My codes are attached below. Note that I already have a time.sleep.
Specs: MacOS 12.6, chrome and webdriver version 106.something, Python 3.
s = Service('/web_scraping/chromedriver')
driver = webdriver.Chrome(service=s)
wait = WebDriverWait(driver, 3)
driver.get('https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712')
spotify_button_element = driver.find_element("xpath",'/html/body/div/div[2]/main/div[2]/div/div[1]/div[5]/div[1]/div[2]/div/div/div[2]/div/div[1]/button[3]')
driver.execute_script("arguments[0].click();", spotify_button_element)
time.sleep(3)
print(driver.current_url)
Any idea on why this happened and how to fix it? Hugh thanks in advance!
What you could do instead of finding the button to click and opening a new tab is to do the following:
import json
spotify_data_request = driver.find_element("id",'__NEXT_DATA__') # get the data stored in a script tag with id = '__NEXT_DATA__'
temp = json.loads(spotify_data_request.get_attribute('innerHTML')) # convert the string into a dict like object
print(temp['props']['pageProps']['episode']['songs'][0]['song']['spotifyId']) # get the Id attribute that you want instead of having to click the spotify button and retrieve it from the URL
I am trying to make a scraper that will go through a bunch of links, export the guide as a PDF, and loop through all the guides that are in the parent folder. It works fine going in, ,but when I try to go backwards, it throws stale exceptions, even when I make sure to refresh the elements in the code, or refresh the page.
from selenium import webdriver
import time, bs4
browser = webdriver.Firefox()
browser.get('MYURL')
loginElem = browser.find_element_by_id('email')
loginElem.send_keys('LOGIN')
pwdElem = browser.find_element_by_id('password')
pwdElem.send_keys('PASSWORD')
pwdElem.submit()
time.sleep(3)
category = browser.find_elements_by_class_name('title')
for i in category:
i.click()
time.sleep(3)
guide = browser.find_elements_by_class_name('cell')
for j in guide:
j.click()
time.sleep(3)
soup = bs4.BeautifulSoup(browser.page_source, features="html.parser")
guidetitle = soup.find_all(id='guide-intro-title')
print(guidetitle)
browser.find_element_by_link_text('Options').click()
time.sleep(0.5)
browser.find_element_by_partial_link_text('Download PDF').click()
browser.find_element_by_id('download').click()
browser.execute_script("window.history.go(-2)")
print("went back")
time.sleep(5)
print("waited")
guide = browser.find_elements_by_class_name('thumb')
print("refreshed elements")
print("made it to outer loop")
This happens if I both use a script to move the browser back, or the driver.back() method. I can see that it makes it back to the child directory, then waits, and refreshes the elements. But, then it can't seem to load the new element to go into the next guide. I found a similar questions here on SO but someone just provided code tailored to the problem instead of explaining so I am still confused.
I also know about using waitdriver but I am just using sleep now since I don't fully understand the EC wait conditions. In any case, increasing the sleep time doesn't fix this issue.
Stale Element Reference Exception occurs upon page refresh because of an element UUID change in the DOM.
How to avoid it: Always try to search for an element right before interaction.
In your code, you searched for cells, found them and stored them in guide. So now, guide has a list of selenium UUIDs. But then, you are making a loop to go through the list, and upon each refresh (that happens when you do back I believe), cell's UUID changes, so old ones that you have stored are no longer attached to the DOM. When trying to interact with them, Selenium cannot find them in the DOM and throws this exception.
Instead of looping through guide your way, try re-find element every time, like:
guide = browser.find_elements_by_class_name('cell')
for j in range(len(guide)):
browser.find_elements_by_class_name('cell')[j].click()
Note, it looks like category might have a similar problem, so try applying this solution to category as well.
Hope this helps. Here is a similar issue and a solution.
I'm trying to understand how to scrape some dynamic webpages,
but I'm unable to get it to work.
(The page I'm currently playing with is betfair.com, which on their live betting
soccer page have a dynamic match statistics page. To see it in action, go to
betfair.com->Odds->LiveBetting, click on any soccer match.)
It is embedded inside two iframes, which I can access using:
frame1 = browser.find_element_by_xpath('//iframe[contains(#class, "player")]')
browser.switch_to.frame(frame1)
frame2 = browser.find_element_by_xpath('//iframe[contains(#id, "playerFrame")]')
browser.switch_to.frame(frame2)
I get an iframe back and can switch to it. So far so good.
However, when I now try to use 'browser' for anything,
I get no response what so ever.
Is there anything else one need to do in order to read form the content?
I'm trying something like:
browser.find_element_by_xpath("//div[contains(#id, 'in-game-stats')]")
The inner iframe above does contain the id. Also, if I try the steps above manual using chrome dev tools it does work. Any clues on why I get no answer to the above? Do I need to wait for something before it becomes available?
There is a third iframe underneath your frame2, select that before requesting for in-game-stats. All together,
frame1 = browser.find_element_by_xpath('//iframe[contains(#class, "player")]')
browser.switch_to.frame(frame1)
frame2 = browser.find_element_by_xpath('//iframe[contains(#id, "playerFrame")]')
browser.switch_to.frame(frame2)
You can try to get a better way of identifying this last iframe, here I am going to index it as the first iframe under iframe2.
frame3 = browser.find_element_by_xpath('//iframe[1]')
browser.switch_to.frame(frame3)
Now you can get the node that you were looking for:
browser.find_element_by_xpath("//div[contains(#id, 'in-game-stats')]")
I want to crawl the dialogue text in a popup window. The problem is that after I triggered the link the window appears but it seems that the selenium driver cannot handle it automatically as I learned from other questions on this site by entering driver.window_handles.
The source of the trigger:
The value of len(driver.window_handles) is 1. I thought I can get the window element and then get the text via the get_attributes, fortunately I succeeded getting the element by
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
selenium.webdriver.remote.webelement.WebElement (session="f810cbbe-db43-4e8d-b484-664559ec8efc", element="{dd00e689-7991-44e9-85d3-76c69e79218f}")
But the sad thing is I don't know how to get all the stuff out from it since I don't know their attributes.
I'm not certain if it's a dialogue, a front end engineer told me that it looks like an animation. Anyway this is the source snippet:
PS: the browser is Firefox.
I thought it may violate the site's Acceptable Use Policy to crawl then I should hide some information. Sorry.
Once you have your parent element :
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
you can continue calling methods on this object, and in this order reach the children elements, you can use find element_by_xpath, or find element_by_class name, for example:
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
wd.find_element_by_class_name("list_box").find_element_by_class_name("list_ul").find_elements_by_class_name("list_li S_line1 clearfix")
and so on until you reach the desired element down the hierarchy and extract it's content as you wish.
I hope this helps!