I have been trying to get access to the 4 images on this page: https://altkirch-alsace.fr/serviable/demarches-en-ligne/prendre-un-rdv-cni/
However the grey region seems to be Ajax-loaded (according to its class name). I want to get the element <div id="prestations"> inside of it but can't access it, nor any other element within the grey area.
I have tried to follow several answers to similar questions, but no matter how long I wait I get an error that the element is not found ; the element is here when I click "Inspect element" but I don't see it when I click "View source". Does that mean I can't access it through selenium?
Here is my code:
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("https://altkirch-alsace.fr/serviable/demarches-en-ligne/prendre-un-rdv-cni/")
element = WebDriverWait(driver, 10) \
.until(lambda x: x.find_element(By.ID, "prestations"))
print(element)
You're not using WebDriverWait(...).until properly. Your lambda is using find_element, which throws exception when it is called and the element is not found.
You should use it like this instead:
from selenium.webdriver.support import expected_conditions as EC
...
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "prestations"))
)
Got the same problem. There is no issue with waiting. There is a frame on that webpage:
<iframe src="https://www.rdv360.com/mairie-d-altkirch?ajx_md=1" width="100%"
height="600" style="border:0px;"></iframe>
The Document Object Model - DOM of the main website does not contain any informations about the webpage that is loaded into a frame.
Even waiting hours you will not find any elements inside this frame, as it has its own DOM.
You need to switch the WebDriver context to this frame. Than you may access the frame's DOM:
As on your website the iframe has no id, you may search for all frames like described on the selenium webpage (https://www.selenium.dev/documentation/webdriver/browser/frames/):
The code searches for all HTML-tags of type "iframe" and takes the first (maybe it should be [0] not [1])
iframe = driver.find_elements(By.TAG_NAME,'iframe')[1]
driver.switch_to.frame(iframe)
Now you may find your desired elements.
The solution that worked in my case on a webpage like this:
<html>
...
<iframe ... id="myframe1">
...
<\iframe>
<iframe ... id="myframe2">
...
<\iframe>
...
</html>
my_iframe = my_WebDriver.find_element(By.ID, 'myframe_1')
my_Webdriver.swith_to.frame(my_iframe)
also working:
my_WebDriver.switch_to.frame('myframe_1')
According Selenium docs you may use iframe's name, id or the element itself to switch to that frame.
Related
I am new to selenium and web automation tasks, and I am trying to code an application for automating papers search on PubMed, using chromedriver.
My aim is to click the top right "Sign in" button in the PubMed main page, https://www.ncbi.nlm.nih.gov/pubmed. So the problem is:
when I open PubMed main page manually, there are no iframes tags in the html source, and therefore the "Sign in" element should be simply accessible by its xpath "//*[#id="sign_in"]".
when same page is openened by selenium instead, I cannot find that "Sign in" element by its xpath, and if a try to inspect it, the html source seems to have embedded it in an <iframe> tag, so that it cannot be found anymore unless a driver._switch_to.frame method is executed. But if I open the html source code by Ctrl+Uand search for <iframe> element, there is still none of them. Here is the "Sign in" element inspection capture:
["Sign in" inspection][1]
I already got round this problem by:
bot = PubMedBot()
bot.driver.get('https://www.ncbi.nlm.nih.gov/pubmed')
sleep(2)
frames = bot.driver.find_elements_by_tag_name('iframe')
bot.driver._switch_to.frame(frames[0])
btn = bot.driver.find_element_by_xpath('/html/body/a')
btn.click()
But all I would like to understand is why the inspection code is different from the html source code, whether this <iframe> element is really appearing from nowhere, and if so why.
Thank you in advance.
You are facing issue due to synchronization .Kinddly find below solution to resolve your issue:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"path ofchromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get("https://www.ncbi.nlm.nih.gov/pubmed")
iframe = wait.until(EC.presence_of_element_located((By.TAG_NAME, "iframe")))
driver.switch_to.frame(iframe)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Sign in to NCBI')]"))).click()
Output:
Inspect element:
I am trying to scrape the available apartment listings from the following webpage: https://3160599v2.onlineleasing.realpage.com/
I am using the Python implementation of Selenium, but so far I haven't found an effective solution to programmatically get the content. My most basic code is the following, which currently just returns the non-dynamic HTML source code:
from selenium import webdriver
driver = webdriver.Chrome('/path_to_driver')
driver.get('https://3160599v2.onlineleasing.realpage.com/')
html = driver.page_source
The returned html variable does not contain the apartment listings I need.
If I 'Inspect' the element using Chrome's built-in inspect tool, I can see that the content is within an un-classed iframe: <iframe frameborder="0" realpage-oll-widget="RealPage-OLL-Widget" style="width: 940px; border: none; overflow: hidden; height: 2251px;"></iframe>
Several children down within this iframe you can also see the div <div class="main-content"> which contains all the info I need.
Other solutions I have tried include implementing an explicit WebDriverWait:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'main-content')))
I get a TimeoutException with this method as the element is never found.
I also tried using the driver.switch_to.frame() method, with no success.
The only steps that have actually allowed me to get the apartment listings out of the webpage have been (using Chrome):
Manually right-click on an element of the listings within the webpage
Click Inspect
Find the div 'main-content'
Manually right-click on this div and select Copy -> Copy Element
This is not an effective solution since I'm seeking to automate this process.
How can I get this dynamically generated content out of the webpage in a programatic way?
Try to use below code to switch to iframe:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_xpath('//iframe[#realpage-oll-widget="RealPage-OLL-Widget"]')))
Also note that method that allows to switch to static iframe is switch_to.frame(), but not switch-to.frame()
You can not directly see the content which is in the iframe. You need to change frame. You can do this by firstly selecting 'iframe element' and then switching to it with driver.switch_to.frame() function.
iframe = driver.get_element_by_id('iframe')
driver.switch_to.frame(iframe)
After that you can access the iframe's content.
Alternatively, you can take the source attribute of iframe then going to that page with selenium. In the end, iframe content is another html page.
I'm building a scraper that needs to scrape prices from a dozen of different websites .
All websites use JS to show the price, so I went with selenium in order to scrape the data needed .
Before starting to build the scraper, I created a list of xpaths I need to get the price element for each url I was scraping in an external file .
I got those xpath using FireFox and Firebug, HoĆ wever I get an errror, everytime I try to get these element with selenium (PhantomJS driver ):
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Una
ble to find element with xpath './div'","request":{"headers":{"Accept":"applicat
ion/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"89
","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51048","User
-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"usin
g\": \"xpath\", \"sessionId\": \"27f41b80-cf63-11e6-bbcc-13b1a315759a\", \"value
\": \"./div\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"ele
ment","directory":"/","path":"/element","relative":"/element","port":"","host":"
","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/
element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/27f41b80-cf
63-11e6-bbcc-13b1a315759a/element"}}
It seems that my xpath is wrong, However I double checked it, using other plugins and it's correct everytime I test that xpath on Firefox .
Here are two different xpaths that both should work, (they did using Firfox , but didn't with selenium) :
"id('regular-hero')/div[3]/div[1]/div[2]/div/div[1]/div/div/span"
"/html/body/div[2]/div[3]/div[1]/div[2]/div/div[2]/div/div/span/text()[1]"
And here's the target page html code Here
And here's the code to get the elemnts using selenium :
self.browser = wd.PhantomJS()
for n in xrange(len(self.url_list)):
url = self.url_list[n]
provider = self.provider_list[n]
self.browser.get(url)
for plan in provider:
for hosting_plan in provider[plan]:
xpath = hosting_plan.values()[0] # Get the xpath of a plan
price_elem = self.browser.find_element_by_xpath("//*")
print price_elem
self.browser.close()
All the loops are used to traverse the JSON external file that hold the xpath list .
What is wrong ? What should I try ? Can lxml help me (Given that the HTML code is broken sometimes)?
According to provided link you can match required element with following XPath:
//span[#class="term-price"]
If your element generated with JavaScript you need to wait some time for element appearance:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//span[#class='term-price']")))
I'm trying to scrape this website for the list of company names, code, industry, sector, mkt cap, etc in the table with selenium. I'm new to it and have written the below code:
path_to_chromedriver = r'C:\Documents\chromedriver'
browser = webdriver.Chrome(executable_path=path_to_chromedriver)
url = r'http://sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts'
browser.get(url)
time.sleep(15)
output = browser.page_source
print(output)
However, I'm able to get the below tags, but not the data in it..
<div class="table-wrapper results-display">
<table>
<thead>
<tr></tr>
</thead>
<tbody></tbody>
</table>
</div>
<div class="pager results-display"></div>
I have previously also tried BS4 to scrape it, but failed at it. Any help is much appreciated.
The results are in an iframe - switch to it and then get the .page_source:
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)
I would also add a wait for the table to be loaded:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# locate and switch to the iframe
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)
# wait for the table to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.companyName')))
print(driver.page_source)
This is totally do-able. What might be the easiest is to use a 'find_elements' call (note that it's plural) and grab all of the <tr> elements. It will return a list that you can parse using find element (singular) calls on each one in the list, but this time find each element by class.
You may be running into a timing issue. I noticed that the data you are looking for loads VERY slowly. You probably need to wait for that data. The best way to do that will be to check for its existence until it appears, then try to load it. Find elements calls (again, note that I'm using the plural again) will not throw an exception when looking for elements and finding none, it will just return an empty list. This is a decent way to check for the data to appear.
How do I use Python to simply find a link containing text/id/etc and then click that element?
My imports:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
Code to go to my URL:
browser = webdriver.Firefox() # Get local session of firefox
browser.get(myURL) # Load page
#right here I want to look for element containing "Click Me" in the text
#and then click it
Look at method list of WebDriver class - http://selenium.googlecode.com/svn/trunk/docs/api/py/webdriver_remote/selenium.webdriver.remote.webdriver.html
For example, to find element by its ID and click it, the code will look like this
element = browser.find_element_by_id("your_element_id")
element.click()
Hi there are a couple of ways to find a link (or any element):
element = driver.find_element_by_id("element-id")
element = driver.find_element_by_name("element-name")
element = driver.find_element_by_xpath("//input[#id='element-id']")
element = driver.find_element_by_link_text("link-text")
element = driver.find_element_by_class_name("class-name")
I think the best option for you is find_element_by_link_text since it's a link.
Once you saved the element in a variable, you call the click function: element.click() or element.send_keys(Keys.RETURN)
Take a look to the selenium-python documentation, there are a couple of examples there.
You just use this code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Firefox()
driver.get("https://play.spotify.com/")# here change your link
wait=WebDriverWait(driver,250)
# it will wait for 250 seconds an element to come into view, you can change the #value
submit=wait.until(EC.presence_of_element_located((By.LINK_TEXT, 'Click Me')))
submit.click()
here wait is best fit for javascript enabled website let suppose if you load the page in webdriver but due to javascript some element loads after some second then it best suite to use wait element there.
for link containing text
browser = webdriver.firefox()
browser.find_element_by_link_text(link_text).click()
You can also do it by xpath:
Browser.find_element_by_xpath("//a[#href='you_link']").click()
Finding Element by Link Text
driver.find_element_by_link_text("")
Finding Element by Name
driver.find_element_by_name("")
Finding Element By ID
driver.find_element_by_id("")
Try SST - it is a simple yet very good test framework for python.
Install it first: http://testutils.org/sst/index.html
Then:
Imports:
from sst.actions import *
First define a variable - element - that you're after
element = assert_element(text="some words and more words")
I used this: http://testutils.org/sst/actions.html#assert-element
Then click that element:
click_element('element')
And that's it.
First start one instance of browser. Then you can use any of the following methods to get element or link. After locating the element you can use element.click() or element.send_keys(Keys.RETURN) or any other object of selenium webdriver
browser = webdriver.Firefox()
Selenium provides the following methods to locate elements in a page:
To find individual elements. These methods will return individual element.
browser.find_element_by_id(id)
browser.find_element_by_name(name)
browser.find_element_by_xpath(xpath)
browser.find_element_by_link_text(link_text)
browser.find_element_by_partial_link_text(partial_link_text)
browser.find_element_by_tag_name(tag_name)
browser.find_element_by_class_name(class_name)
browser.find_element_by_css_selector(css_selector)
To find multiple elements (these methods will return a list). Later you can iterate through the list or with elementlist[n] you can locate individual element from list.
browser.find_elements_by_name(name)
browser.find_elements_by_xpath(xpath)
browser.find_elements_by_link_text(link_text)
browser.find_elements_by_partial_link_text(partial_link_text)
browser.find_elements_by_tag_name(tag_name)
browser.find_elements_by_class_name(class_name)
browser.find_elements_by_css_selector(css_selector)
browser.find_element_by_xpath("")
browser.find_element_by_id("")
browser.find_element_by_name("")
browser.find_element_by_class_name("")
Inside the ("") you have to put the respective xpath, id, name, class_name, etc...