I'm trying to understand how to scrape some dynamic webpages,
but I'm unable to get it to work.
(The page I'm currently playing with is betfair.com, which on their live betting
soccer page have a dynamic match statistics page. To see it in action, go to
betfair.com->Odds->LiveBetting, click on any soccer match.)
It is embedded inside two iframes, which I can access using:
frame1 = browser.find_element_by_xpath('//iframe[contains(#class, "player")]')
browser.switch_to.frame(frame1)
frame2 = browser.find_element_by_xpath('//iframe[contains(#id, "playerFrame")]')
browser.switch_to.frame(frame2)
I get an iframe back and can switch to it. So far so good.
However, when I now try to use 'browser' for anything,
I get no response what so ever.
Is there anything else one need to do in order to read form the content?
I'm trying something like:
browser.find_element_by_xpath("//div[contains(#id, 'in-game-stats')]")
The inner iframe above does contain the id. Also, if I try the steps above manual using chrome dev tools it does work. Any clues on why I get no answer to the above? Do I need to wait for something before it becomes available?
There is a third iframe underneath your frame2, select that before requesting for in-game-stats. All together,
frame1 = browser.find_element_by_xpath('//iframe[contains(#class, "player")]')
browser.switch_to.frame(frame1)
frame2 = browser.find_element_by_xpath('//iframe[contains(#id, "playerFrame")]')
browser.switch_to.frame(frame2)
You can try to get a better way of identifying this last iframe, here I am going to index it as the first iframe under iframe2.
frame3 = browser.find_element_by_xpath('//iframe[1]')
browser.switch_to.frame(frame3)
Now you can get the node that you were looking for:
browser.find_element_by_xpath("//div[contains(#id, 'in-game-stats')]")
Related
https://squidindustries.co/checkout
checkout_cc_number = driver.find_element_by_id("number")
checkout_cc_number.send_keys(card_number)
When I try to input information into the card number field I get an error saying the element could not be located. I tried using time.sleep and driver.implicitly_wait when i first got to the page but both failed. Any ideas?
The element is in a frame (i.e. a webpage within a webpage). Selenium will look for elements in the page it has loaded and not within frames. That's the problem.
To solve this we just need a bit more code, which will tell Selenium to look in the frame.
The example you've given is several pages deep into a shopping cart, so I'm going to use a much more accessible example instead: the mozilla guide to iframes.
Here is some code to open that page and then click the CSS button within the frame:
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get(r"https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
time.sleep(5)
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
css_button = browser.find_element_by_id("css")
css_button.click()
browser.switch_to.default_content()
There are two lines that are important. The first one is:
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
That finds the frame and then switches to it. Once we have done that, any code that looks for elements will be looking in the frame and not in the page that we navigated to. That is what you need to do to access the number element. In your example the class of the frame is card-fields-iframe, so use that instead of interactive.
The second important line is:
browser.switch_to.default_content()
That reverts the previous line. So now Selenium will be looking for elements within the page that we navigated to. You'll want to do that after interacting with the frame, so that you can continue through the shopping cart.
have you tried getting the input element using the DOM? what happens if you do document.getElementById('number') ?
I ran into the same issue, and with checkouts, as you mentioned, all the iframe class names are the same. What I did was get all the iframes with the same class name as a list:
iframes = driver.find_elements(By.CLASS_NAME, "card-fields-iframe")
I then switched through the iframes referencing each one by its place in the list. Since there are only four fields in the checkout, the list is only 4 elements long, starting with [0].
driver.switch_to.frame(iframes[0])
number = driver.find_element(By.ID, "number")
if number.is_displayed:
number.send_keys("4000300040005000")
driver.switch_to.default_content()
It's important to note that switching back to the default content, using driver.switch_to.default_content(), before switching to the next frame is the only way I was able to make this work. The is_displayed function just checks to see whether the element is on the page or not.
I am using Selenium along with python to scrape some pages. I have many web pages that represent the same type of objects(football player information) but each of them has a slightly different HTML layout. In particular my main issue here is that the div class identifiers change when refreshing or changing web page, in a way which is unpredictable.
In the specific case I would like to get the data in the div which class identifier "jss176", but when I get to another player this will change to "jss450" for example, with no meaningful pattern to be found.
Is there a way I can go around this? I was thinking of navigating through the Childs starting from div with id = "root", but I don't seem to find a good piece of code to achieve this.
Thank you very much!
If only the id's change, but not the web structure, you could scrape the info by XPATH.
https://www.tutorialspoint.com/what-is-xpath-in-selenium-with-python
You can directly access the div you want and select in chrome "copy XPATH" option in the browser.
I am trying to scrape information off websites like this:
https://www.glassdoor.com/Overview/Working-at-7-Eleven-EI_IE3581.11,19.htm
using python + beautifulsoup + mechanize.
Accessing anything on the main-site is no problem. However, I also need the information that appears in a overlay-window that appears when one clicks on the "Rating Trends" button next to the bar with stars.
This overlay-window can also be accessed directly by using the url:
https://www.glassdoor.com/Reviews/7-Eleven-Reviews-E3581.htm#trends-overallRating
The html associated with this page is a modification of the original site's html.
However, regardless of what element I try to find (via findAll ) on that overlay-window website, beautifulsoup returns zero hits.
How can I fix this? I tried adding a sleep time between accessing the website and reading anything in, to no avail.
Thanks!
If you're using the Chrome browser select the background of that page (without the additional information displayed) and select 'Inspect' from the context menu (for Windows anyway), then the 'Network' tab, so that you can see network traffic. Now click on 'Rating trends'. The entry marked 'xhr' will be https://www.glassdoor.ca/api/employer/3581-rating.htm?locationStr=&jobTitleStr=&filterCurrentEmployee=false&filterEmploymentStatus=REGULAR&filterEmploymentStatus=PART_TIME (I much hope!) and its contents will be the following.
{"employerId":3581,"ratings":[{"hasRating":true,"type":"overallRating","value":2.9},{"hasRating":true,"type":"ceoRating","value":0.54},{"hasRating":true,"type":"bizOutlook","value":0.35},{"hasRating":true,"type":"recommend","value":0.4},{"hasRating":true,"type":"compAndBenefits","value":2.4},{"hasRating":true,"type":"cultureAndValues","value":2.5},{"hasRating":true,"type":"careerOpportunities","value":2.5},{"hasRating":true,"type":"workLife","value":2.4},{"hasRating":true,"type":"seniorManagement","value":2.3}],"week":0,"year":0}
Whether this URL can be altered for use in obtaining information for other employers, I regret, I cannot tell you.
I want to crawl the dialogue text in a popup window. The problem is that after I triggered the link the window appears but it seems that the selenium driver cannot handle it automatically as I learned from other questions on this site by entering driver.window_handles.
The source of the trigger:
The value of len(driver.window_handles) is 1. I thought I can get the window element and then get the text via the get_attributes, fortunately I succeeded getting the element by
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
selenium.webdriver.remote.webelement.WebElement (session="f810cbbe-db43-4e8d-b484-664559ec8efc", element="{dd00e689-7991-44e9-85d3-76c69e79218f}")
But the sad thing is I don't know how to get all the stuff out from it since I don't know their attributes.
I'm not certain if it's a dialogue, a front end engineer told me that it looks like an animation. Anyway this is the source snippet:
PS: the browser is Firefox.
I thought it may violate the site's Acceptable Use Policy to crawl then I should hide some information. Sorry.
Once you have your parent element :
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
you can continue calling methods on this object, and in this order reach the children elements, you can use find element_by_xpath, or find element_by_class name, for example:
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
wd.find_element_by_class_name("list_box").find_element_by_class_name("list_ul").find_elements_by_class_name("list_li S_line1 clearfix")
and so on until you reach the desired element down the hierarchy and extract it's content as you wish.
I hope this helps!
I've been trying to log into a website using selenium. I have the following problem > i manage to open an iframe (it is correctly displayed after i click on the login button) but i've been unable to move into it. I've been trying to find it via it's Id and name but i somehow can't find it...
The line looks like:
driver.switch_to.frame(driver.find_element_by_id("frameid"))
I get an error "no such element". I checked 1000 times the id and it is correct! The frame i want to switch to is not inside another frame.
Any suggestions ?
It seem's that the id is somehow dynamically allocated but always starts with the same characters. I can't use a regex-based search apparently
In this case you can locate it with an XPath and starts-with() function:
frame = driver.find_element_by_xpath("//iframe[starts-with(#id, 'something')]")
driver.switch_to.frame(frame)
Or, with a CSS "begin with" selector:
frame = driver.find_element_by_css_selector("iframe[id^=something]")
driver.switch_to.frame(frame)