I try to script some test on an internal website in selenium in python.
And this site contains a lot of iframes.
I have no issue to switch iframe, but I try to set the url of the current iframe, and I do not find any method to attain this.
iframe1 = WebDriverWait (driver, 10).until (EC.presence_of_element_located((By.ID, 'iframe1')))
driver.switch_to_frame (iframe1)
iframe2 = WebDriverWait (driver, 10).until (EC.presence_of_element_located((By.ID, 'iframe2')))
driver.switch_to_frame (iframe2)
# I was expecting something like this
driver.get("/new_url_inside_my_frame.html")
But it does not work, at first because get does not works with relative url, but even if I use a complete url, it sets the all the page and not just the iframe.
I am pretty sure, this is possible, I just not find how anywhere.
Related
I have a list of domains that I would like to loop over and screenshot using selenium. However, the cookie consent column means the full page is not viewable. Most of them have different consent buttons - what is the best way of accepting these? Or is there another method that could achieve the same results?
urls for reference: docjournals.com, elcomercio.com, maxim.com, wattpad.com, history10.com
You'll need to click accept individually for every website.
You can do that, using
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, "your_XPATH_locator").click()
To get around the XPATH selectors varying from page to page you can use
driver.current_url and use the url to figure out which selector you need to use.
Or alternatively if you iterate over them anyways you can do it like this:
page_1 = {
'url' : 'docjournals.com'
'selector' : 'example_selector_1'
}
page_2 = {
'url' = 'elcomercio.com'
'selector' : 'example_selector_2'
}
pages = [page_1, page_2]
for page in pages:
driver.get(page.url)
driver.find_element(By.XPATH, page.selector).click()
From the snapshot
as you can observe diffeent urls have different consent buttons, they may vary with respect to:
innerText
tag
attributes
implementation (iframe / shadowRoot)
Conclusion
There can't be a generic solution to accept/deny the cookie concent as at times:
You may need to induce WebDriverWait for the element_to_be_clickable() and click on the concent.
You may need to switch to an iframe. See: Unable to locate cookie acceptance window within iframe using Python Selenium
You may need to traverse within a shadowRoot. See: How to get past a cookie agreement page using Python and Selenium?
I am trying to use Selenium to log into a printer and conduct some tests. I really do not have much experience with this process and have found it somewhat confusing.
First, to get values that I may have wanted I opened the printers web page in Chrome and did a right click "view page source" - This turned out to be not helpful. From here I can only see a bunch of <script> tags that call some .js scripts. I am assuming this is a big part of my problem.
Next I selected the "inspect" option after right clicking. From here I can see the actual HTML that is loaded. I logged into the site and recorded the process in Chrome. With this I was able to identify the variables which contain the Username and Password. I went to this part of the HTML did a right click and copied the Xpath. I then tried to use the Selenium find_element_by_xpath but still no luck. I have tried all the other methods to (find by ID, and name) however it returns an error that the element is not found.
I feel like there is something fundamental here that I am not understanding. Does anyone have any experience with this???
Note: I am using Python 3.7 and Selenium, however I am not opposed to trying something other than Selenium if there is a more graceful way to accomplish this.
My code looks something like this:
EDIT
Here is my updated code - I can confirm this is not just a time/wait issue. I have managed to successfully grab the first two outer elements but the second I go deeper it errors out.
def sel_test():
chromeOptions = Options()
chromeOptions.add_experimental_option("useAutomationExtension", False)
browser = webdriver.Chrome(chrome_options=chromeOptions)
url = 'http://<ip address>/'
browser.get(url)
try:
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, '//*[#id="ccrx-root"]')))
finally: browser.quit()
The element that I want is buried in this tag - Maybe this has something to do with it? Maybe related to this post
<frame name="wlmframe" src="../startwlm/Start_Wlm.htm?arg11=">
As mentioned in this post you can only work with the current frame which is seen. You need to tell selenium to switch frames in order to access child frames.
For example:
browser.switch_to.frame('wlmframe')
This will then load the nested content so you can access the children
Your issue is most likely do to either the element not loading on the page until after your bot searches for it, or a pop-up changing the xpath of the element.
Try this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
delay = 3 # seconds
try:
elementUsername = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.xpath, 'element-xpath')))
element.send_keys('your username')
except TimeoutException:
print("Loading took too much time!")
you can find out more about this here
I have been trying to use selenium to scrape and entire web page. I expect at least a handful of them are spa's such as Angular, React, Vue so that is why I am using Selenium.
I need to download the entire page (if some content isn't loaded from lazy loading because of not scrolling down that is fine). I have tried setting a time.sleep() delay, but that has not worked. After I get the page I am looking to hash it and store it in a db to compare later and check to see if the content has changed. Currently the hash is different every time and that is because selenium is not downloading the entire page, each time a different partial amount is missing. I have confirmed this on several web pages not just a singular one.
I also have probably a 1000+ web pages to go through by hand just getting all the links so I do not have time to find an element on them to make sure it is loaded.
How long this process takes is not important. If it takes 1+ hours so be it, speed is not important only accuracy.
If you have an alternative idea please also share.
My driver declaration
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
driverPath = '/usr/lib/chromium-browser/chromedriver'
def create_web_driver():
options = webdriver.ChromeOptions()
options.add_argument('headless')
# set the window size
options.add_argument('window-size=1200x600')
# try to initalize the driver
try:
driver = webdriver.Chrome(executable_path=driverPath, chrome_options=options)
except WebDriverException:
print("failed to start driver at path: " + driverPath)
return driver
My url call my timeout = 20
driver.get(url)
time.sleep(timeout)
content = driver.page_source
content = content.encode('utf-8')
hashed_content = hashlib.sha512(content).hexdigest()
^ getting different hash here every time since same url not producing same web page
As the Application Under Test(AUT) is based on Angular, React, Vue in that case Selenium seems to be the perfect choice.
Now, as you are fine with the fact that some content isn't loaded from lazy loading because of not scrolling makes the usecase feasible. But in all possible ways ...do not have time to find an element on them to make sure it is loaded... can't be really compensated inducing time.sleep() as time.sleep() have certain drawbacks. You can find a detailed discussion in How to sleep webdriver in python for milliseconds. It would be worth to mention that the state of the HTML DOM will be different for all the 1000 odd web pages.
Solution
A couple of viable solutions:
A pottential solution could have been to induce WebDriverWait and ensure that some HTML elements are loaded as per the discussion How can I make sure if some HTML elements are loaded for Selenium + Python? validating atleast either of the following:
Page Title
Page Heading
Another solution would be to tweak the capability pageLoadStrategy. You can set the pageLoadStrategy for all the 1000 odd web pages to common point assigning a value either:
normal (full page load)
eager (interactive)
none
You can find a detailed discussion in How to make Selenium not wait till full page load, which has a slow script?
If you implement pageLoadStrategy, page_source method will be triggered at the same tripping point and possibly you would see identical hashed_content.
In my experience time.sleep() does not work well with dynamic loading times.
If the page is javascript-heavy you have to use the WebDriverWait clause.
Something like this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get(url)
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[my-attribute='my-value']")))
Change 10 with whatever timer you want, and By.CSS_SELECTOR and its value with whatever type you want to use as a reference for a lo
You can also wrap the WebDriverWait around a Try/Except statement with the TimeoutException exception, which you can get from the submodule selenium.common.exceptions in case you want to set a hard limit.
You can probably set it inside a while loop if you truly want it to check forever until the page's loaded, because I couldn't find any reference in the docs about waiting "forever", but you'll have to experiment with it.
Im trying to scrape a webpage using selenium. The xpaths suggested by inspecting the page and right clicking are of an unstable kind (/html/body/table[2]/tbody/tr[1]/td/form/table/tbody/tr[2]) . So I tried the following solution instead:
driver = webdriver.Chrome("path")
driver.get("https://www.bundesfinanzhof.de/entscheidungen/entscheidungen-online")
time.sleep(1)
links=driver.find_element_by_xpath('//tr[#class="SuchForm"]')
or even
links=driver.find_elements_by_xpath('//*[#class="SuchForm"]')
don't return any results. However earlier on in the page I can obtain:
links=driver.find_element_by_xpath('//iframe')
links.get_attribute('src')
It seems that after:
<script language="JavaScript" src="/rechtsprechung/jscript/list.js" type="text/javascript"></script>
I can no longer get to any of the elements.
How do I determine the correct XPath?
suggests that parts within a script are impossible to parse. However, the path I am after seems to me not to be within a path. Am I misinterpretting how scripts work on a page ?
For instance, later on there is a path:
/html/body/table[2]/tbody/tr[1]/td/script
I would expect this to create such a problem. I am by no means a programmer, so my understanding of this subject is limited. Can someone explain what the problem is and if possible a solution ?
Attempted using solutions from:
Find element text using xpath in selenium-python NOt Working
xpath does not work with this site, pls verify
The table is located inside an iframe, so you need to switch to that iframe before handling required tr:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver.get("https://www.bundesfinanzhof.de/entscheidungen/entscheidungen-online")
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#src='https://juris.bundesfinanzhof.de/cgi-bin/rechtsprechung/list.py?Gericht=bfh&Art=en']")))
link = driver.find_element_by_xpath('//tr[#class="SuchForm"]')
Use driver.switch_to.default_content() to switch back from iframe
Using Selenium in Python, I'd like to download a page, and save the HTML code of a particular div, identified by its id. I've got the following:
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
...
with closing(Firefox()) as browser:
browser.get(current_url)
WebDriverWait(browser, timeout=3).until(lambda x: x.find_element_by_id('element_id'))
element = browser.find_element_by_id('element_id')
element is of type selenium.webdriver.remote.webelement.WebElement. Is there a way to get the HTML code (not processed in any way) from element? Is there some better way, using Selenium, of accomplishing this task?
Right from pydoc selenium.webdriver.remote.webelement.WebElement:
| text
| Gets the text of the element.
Use the .text attribute.
If you really are after the HTML source of the element
then please see: Get HTML Source of WebElement in Selenium WebDriver using Python
As stated above, it's not as straight forward as you'd like.