I am trying to scrape the available apartment listings from the following webpage: https://3160599v2.onlineleasing.realpage.com/
I am using the Python implementation of Selenium, but so far I haven't found an effective solution to programmatically get the content. My most basic code is the following, which currently just returns the non-dynamic HTML source code:
from selenium import webdriver
driver = webdriver.Chrome('/path_to_driver')
driver.get('https://3160599v2.onlineleasing.realpage.com/')
html = driver.page_source
The returned html variable does not contain the apartment listings I need.
If I 'Inspect' the element using Chrome's built-in inspect tool, I can see that the content is within an un-classed iframe: <iframe frameborder="0" realpage-oll-widget="RealPage-OLL-Widget" style="width: 940px; border: none; overflow: hidden; height: 2251px;"></iframe>
Several children down within this iframe you can also see the div <div class="main-content"> which contains all the info I need.
Other solutions I have tried include implementing an explicit WebDriverWait:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'main-content')))
I get a TimeoutException with this method as the element is never found.
I also tried using the driver.switch_to.frame() method, with no success.
The only steps that have actually allowed me to get the apartment listings out of the webpage have been (using Chrome):
Manually right-click on an element of the listings within the webpage
Click Inspect
Find the div 'main-content'
Manually right-click on this div and select Copy -> Copy Element
This is not an effective solution since I'm seeking to automate this process.
How can I get this dynamically generated content out of the webpage in a programatic way?
Try to use below code to switch to iframe:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_xpath('//iframe[#realpage-oll-widget="RealPage-OLL-Widget"]')))
Also note that method that allows to switch to static iframe is switch_to.frame(), but not switch-to.frame()
You can not directly see the content which is in the iframe. You need to change frame. You can do this by firstly selecting 'iframe element' and then switching to it with driver.switch_to.frame() function.
iframe = driver.get_element_by_id('iframe')
driver.switch_to.frame(iframe)
After that you can access the iframe's content.
Alternatively, you can take the source attribute of iframe then going to that page with selenium. In the end, iframe content is another html page.
Related
I have been trying to get access to the 4 images on this page: https://altkirch-alsace.fr/serviable/demarches-en-ligne/prendre-un-rdv-cni/
However the grey region seems to be Ajax-loaded (according to its class name). I want to get the element <div id="prestations"> inside of it but can't access it, nor any other element within the grey area.
I have tried to follow several answers to similar questions, but no matter how long I wait I get an error that the element is not found ; the element is here when I click "Inspect element" but I don't see it when I click "View source". Does that mean I can't access it through selenium?
Here is my code:
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("https://altkirch-alsace.fr/serviable/demarches-en-ligne/prendre-un-rdv-cni/")
element = WebDriverWait(driver, 10) \
.until(lambda x: x.find_element(By.ID, "prestations"))
print(element)
You're not using WebDriverWait(...).until properly. Your lambda is using find_element, which throws exception when it is called and the element is not found.
You should use it like this instead:
from selenium.webdriver.support import expected_conditions as EC
...
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "prestations"))
)
Got the same problem. There is no issue with waiting. There is a frame on that webpage:
<iframe src="https://www.rdv360.com/mairie-d-altkirch?ajx_md=1" width="100%"
height="600" style="border:0px;"></iframe>
The Document Object Model - DOM of the main website does not contain any informations about the webpage that is loaded into a frame.
Even waiting hours you will not find any elements inside this frame, as it has its own DOM.
You need to switch the WebDriver context to this frame. Than you may access the frame's DOM:
As on your website the iframe has no id, you may search for all frames like described on the selenium webpage (https://www.selenium.dev/documentation/webdriver/browser/frames/):
The code searches for all HTML-tags of type "iframe" and takes the first (maybe it should be [0] not [1])
iframe = driver.find_elements(By.TAG_NAME,'iframe')[1]
driver.switch_to.frame(iframe)
Now you may find your desired elements.
The solution that worked in my case on a webpage like this:
<html>
...
<iframe ... id="myframe1">
...
<\iframe>
<iframe ... id="myframe2">
...
<\iframe>
...
</html>
my_iframe = my_WebDriver.find_element(By.ID, 'myframe_1')
my_Webdriver.swith_to.frame(my_iframe)
also working:
my_WebDriver.switch_to.frame('myframe_1')
According Selenium docs you may use iframe's name, id or the element itself to switch to that frame.
I'm using xpath to scrape information from a dynamic html table. The information I'm trying to scrape is inside of a tag called #document and I'm not sure how to include this in the xpath since it doesn't follow the normal <>...</> format of most html elements. I also don't have the option to select the xpath when I open the options to the left of the tag in the inspector. What do I do here? I've included a snippet below of the tag I'm referring to.
<iframe>
#document
<html>...</html>
</iframe>
The <html> underneth the #document is within an <iframe>. To access the elements within that <html> you have to:
Induce WebDriverWait for the desired frame to be available and switch to it as follows:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"iframe_xpath")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I'm trying to get some information about a book of Amazon, which worked great for me until I got to the point where I wanted to scrape the content description. The content description is in an iframe-container in which a new HTML-code is started.
I have no problem catching the container via
content = driver.find_element_by_xpath("//iframe[#id='bookDesc_iframe']")
but I can't seem to get the part with the content. I tried
content_text = content.find_element_by_xpath("//div[#id='iframeContent']")
since is where it's hidden but it doen't work for me.
In order to access the iframe content you need to switch into that iframe.
driver.switch_to.frame(driver.find_element_by_xpath("//iframe[#id='bookDesc_iframe']"))
To continue working with other elements, not inside that iframe you will have to switch out of the iframe, to the default content, like this:
driver.switch_to.default_content()
Adding to the #Prophet answer, in case if you would like to do the same with explicit wait (which will eventually be a more reliable), you could do this :
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[#id='bookDesc_iframe']")))
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I am new to selenium and web automation tasks, and I am trying to code an application for automating papers search on PubMed, using chromedriver.
My aim is to click the top right "Sign in" button in the PubMed main page, https://www.ncbi.nlm.nih.gov/pubmed. So the problem is:
when I open PubMed main page manually, there are no iframes tags in the html source, and therefore the "Sign in" element should be simply accessible by its xpath "//*[#id="sign_in"]".
when same page is openened by selenium instead, I cannot find that "Sign in" element by its xpath, and if a try to inspect it, the html source seems to have embedded it in an <iframe> tag, so that it cannot be found anymore unless a driver._switch_to.frame method is executed. But if I open the html source code by Ctrl+Uand search for <iframe> element, there is still none of them. Here is the "Sign in" element inspection capture:
["Sign in" inspection][1]
I already got round this problem by:
bot = PubMedBot()
bot.driver.get('https://www.ncbi.nlm.nih.gov/pubmed')
sleep(2)
frames = bot.driver.find_elements_by_tag_name('iframe')
bot.driver._switch_to.frame(frames[0])
btn = bot.driver.find_element_by_xpath('/html/body/a')
btn.click()
But all I would like to understand is why the inspection code is different from the html source code, whether this <iframe> element is really appearing from nowhere, and if so why.
Thank you in advance.
You are facing issue due to synchronization .Kinddly find below solution to resolve your issue:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"path ofchromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get("https://www.ncbi.nlm.nih.gov/pubmed")
iframe = wait.until(EC.presence_of_element_located((By.TAG_NAME, "iframe")))
driver.switch_to.frame(iframe)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Sign in to NCBI')]"))).click()
Output:
Inspect element:
On this website I try to set some filters to collect data but I can't access to the table using a click event with selenium in my python script.
I noticed that I need to change the style from :
div id="filtersWrapper" class="displayNone " style="display: none;"
to
div id="filtersWrapper" class="displayNone " style="display: block;"
I think that I should use driver.execute_script(), but I have no clue how to do it
I would greatly appreciate some help with this. Thank you!
You can change an attribute on an element using javascript through selenium
element = driver.find_element_by_id('filtersWrapper')
driver.execute_script("arguments[0].setAttribute('attributeToChange', 'new value')", element)
or you can try clicking the element with javascript
driver.execute_script("arguments[0].click();", element)
I have checked the DOM Tree of the webpage. Somehow I was unable to locate any element as:
<div id="filtersWrapper" class="displayNone " style="display: none;">
However the following element exists:
<div id="filtersWrapper" class="displayNone ">
<div id="filtersArrowIndicator" class="arrowIndicator"></div>
.
<div id="economicCalendarSearchPopupResults" class="eventSearchPopupResults economicCalendarSearchPopupResults text_align_lang_base_1 dirSearchResults calendarFilterSearchResults displayNone">
</div>
</div>
Not sure if that was your desired element. A bit of more information about your usecase would have helped us to debug the issue in a better way. However, To set the display property of style attribute as block for the element you can use:
driver.execute_script("document.getElementById('filtersWrapper').style.display='block';");
You can use driver.execute_script() to accomplish this. This is how I change the style attribute in my own code:
div_to_change = driver.find_element_by_id("filtersWrapper")
driver.execute_script("arguments[0].style.display = 'block';", div_to_change)
I had a look at the website you are automating, and you might not need to use JSE at all to do this - there's a reason the div you are trying to click has style = "display: none" - it is not meant to be clicked in this context. Working around that with Javascript might not produce your intended results. This code snippet has been updated with your requirements to set a Time filter in the Economic Calendar section:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://www.investing.com/economic-calendar/")
driver.find_element_by_id("economicCurrentTime").click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "filterStateAnchor"))).click()
checkbox_for_bull3 = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[#id='importance2']")))
driver.execute_script("arguments[0].scrollIntoView(true);", checkbox_for_bull3)
checkbox_for_bull3.click()
checkbox_for_time = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//fieldset[label[#for='timeFiltertimeOnly']]/input")))
checkbox_for_time.click()
I modified your code snippet to fix a few issues -- when navigating to the economic-calendar page, you were clicking the 'Filters' field twice which caused an issue trying to click checkbox_for_bull3. I also added a scrollIntoView() Javascript call.
I ran this on my local machine and the code executed end to end successfully.