I am trying to search through all the html of websites that I reach using selenium webdriver. In selenium, when I have an iframe, I must switch to the iframe and then switch back to the main html to search for other iframes.
However, with nested iframes, this can be quite complicated. I must switch to an iframe, search it for iframes, then switch to one iframe found, search IT for iframes, then to go to another iframe I must switch to the main frame, then have my path saved to switch back to where I was before, etc.
Unfortunately, many pages I've found have iframes within iframes within iframes (and so on).
Is there a simple algorithm for this? Or a better way of doing it?
Finding iframes solely by HTML element tag or attributes (including ID) appears to be unreliable.
On the other hand, recursively searching by iframe indexes works relatively fine.
def find_all_iframes(driver):
iframes = driver.find_elements_by_xpath("//iframe")
for index, iframe in enumerate(iframes):
# Your sweet business logic applied to iframe goes here.
driver.switch_to.frame(index)
find_all_iframes(driver)
driver.switch_to.parent_frame()
I was not able to find a website with several layers of nested frames to fully test this concept, but I was able to test it on a site with just one layer of nested frames. So, this might require a bit of debugging to deal with deeper nesting. Also, this code assumes that each of the iframes has a name attribute.
I believe that using a recursive function along these lines will solve the issue for you, and here's an example data structure to go along with it:
def frame_search(path):
framedict = {}
for child_frame in browser.find_elements_by_tag_name('frame'):
child_frame_name = child_frame.get_attribute('name')
framedict[child_frame_name] = {'framepath' : path, 'children' : {}}
xpath = '//frame[#name="{}"]'.format(child_frame_name)
browser.switch_to.frame(browser.find_element_by_xpath(xpath))
framedict[child_frame_name]['children'] = frame_search(framedict[child_frame_name]['framepath']+[child_frame_name])
...
do something involving this child_frame
...
browser.switch_to.default_content()
if len(framedict[child_frame_name]['framepath'])>0:
for parent in framedict[child_frame_name]['framepath']:
parent_xpath = '//frame[#name="{}"]'.format(parent)
browser.switch_to.frame(browser.find_element_by_xpath(parent_xpath))
return framedict
You'd kick it off by calling: frametree = iframe_search([]), and the framedict would end up looking something like this:
frametree =
{'child1' : 'framepath' : [], 'children' : {'child1.1' : 'framepath' : ['child1'], 'children' : {...etc}},
'child2' : 'framepath' : [], 'children' : {'child2.1' : 'framepath' : ['child2'], 'children' : {...etc}}}
A note: The reason that I wrote this to use attributes of the frames to identify them instead of just using the result of the find_elements method is that I've found in certain scenarios Selenium will throw a stale data exception after a page has been open for too long, and those responses are no longer useful. Obviously, the frame's attributes are not going to change, so it's a bit more stable to use the xpath. Hope this helps.
You can nest one iFrame into another iFrame by remembering the simple line of code to position, then re-position, the cursor back to the same area of the screen by using the as in the following COMPLETE code, remembering always to put the larger iFrame FIRST, then define the position of the SMALLER iFrame SECOND, as in the following FULL example:---
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Daneiella Oddie, Austrailian Ballet Dancer, dancing to Bach-Gounod's Ave Maria</title>
</head>
<body bgcolor="#ffffcc">
<DIV style="position: absolute; top:0px; left:0px; width:0px; height:0px"></div>
<DIV style="position: absolute; top:10px; left:200px; width:900px; height:500px">
<iframe width="824" height="472" src="http://majordomoers.me/Videos/DanielaOddiDancingToBack_GounodsAveMaria.mp4" frameborder="0" allowfullscreen></iframe>
</div>
<DIV style="position: absolute; top:0px; left:0px; width:0px; height:0px"></div>
<DIV style="position: absolute; top:10px; left:0px; width:50px; height:50px">
<iframe src="http://majordomoers.me/Videos/LauraUllrichSingingBach_GounodsAveMaria.mp4" frameborder="0" allowfullscreen></iframe>
</div>
<DIV style="position: absolute; top:0px; left:0px; width:0px; height:0px"></div>
<DIV style="position: absolute; top:470px; left:10px; width:1050px; height:30px">
<br><font face="Comic Sans MS" size="3" color="red">
<li><b>Both Videos will START automatically...but the one with the audio will preceed the dancing by about 17 seconds. You should keep
<li>both videos at the same size as presented here. In all, just lean back and let it all unfold before you, each in its own time.</li></font>
</div>
<br>
</body>
</html>
You can use the below code to get the nested frame hierarchy... Change the getAttribute according to your DOM structure.
static Stack<String> stackOfFrames = new Stack<>();
....
....
public static void getListOfFrames(WebDriver driver) {
List<WebElement> iframes = wd.findElements(By.xpath("//iframe|//frame"));
int numOfFrames = iframes.size();
for(int i=0; i<numOfFrames;i++) {
stackOfFrames.push(iframes.get(i).getAttribute("id"));
System.out.println("Current Stack => " + stackOfFrames);
driver.switchTo().frame(i);
getListOfFrames(driver);
driver.switchTo().parentFrame();
stackOfFrames.pop();
count++;
}
}
Related
I am trying to automate some tasks at work. Requests wont work because I don't have admin access to my works Intercom App. Therefore I use Selenium.
I want to write "Hey" in the chat box of Intercom, and send the message.
** The problem is a changing ember number every time I have a new conversation. It works when I copy the right ember number every time, but when changing the conversation, it doesn't work anymore. **
I am looking for some kind of script to change the ember = XXXXX into the right number each time
Not really relevant to the code problem, but I am using Chrome in debugging mode, to avoid logging in every time I need to test the code, and I am using tkinter to have a button to press, every time I want to write "Hey" in the chat box.
Sorry, I understand it is difficult to replicate this problem.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
#___________
#In order to run Selenium in an already opened browser / session, I need to run this code in CMD:
#cd C:\Program Files (x86)\Google\Chrome\Application
#
#chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\Users\peter\testprogram"
#___________
opt=Options()
opt.add_experimental_option("debuggerAddress","localhost:9222")
driver=webdriver.Chrome(executable_path="
C:\\ProgramFiles\\crromedriver\\chromedriver.exe",options=opt)
def hey():
ember = 32890
hey = driver.find_element_by_xpath('//*[#id="ember'+str(ember)+'"]/div/div[3]/div[1]/div/p')
hey.send_keys("Hey!")
The specific HTML element where I want to write "Hey!": (This is under the big HTML code below)
<p class="intercom-interblocks-align-left embercom-prosemirror-composer-block-selected" style="">Hey! This is where I want my text</p>
One might suggest to use
hey = driver.find_element_by_class_name('intercom-interblocks-align-left embercom-prosemirror-composer-block-selected')
hey.send_keys("Hey!")
But this doesn't work for me.
The HTML element where the ember number is changing:
<div id="ember32890" class="u__relative inbox__conversation-composer__wrapper ember-view"><div>
<div></div>
<div>
</div>
<div data-test-prosemirror-composer="" class="composer-inbox composer-style-basic o__fit conversation__text embercom-prosemirror-composer ">
<style>
.ProseMirror {
outline: none;
white-space: pre-wrap;
}
.ProseMirror .intercom-interblocks-html-block {
white-space: normal;
}
li.ProseMirror-selectednode {
outline: none;
}
.ProseMirror-selectednode.embercom-prosemirror-composer-image img,
.ProseMirror-selectednode.embercom-prosemirror-composer-video iframe,
.ProseMirror-selectednode.embercom-prosemirror-composer-messenger-card
.intercom-interblocks-messenger-card,
.ProseMirror-selectednode.embercom-prosemirror-composer-html-block,
.ProseMirror-selectednode.embercom-prosemirror-composer-button .intercom-h2b-button {
outline: 2px solid #8cf;
}
hr.ProseMirror-selectednode,
.embercom-prosemirror-composer-template.ProseMirror-selectednode,
.embercom-prosemirror-composer-mention.ProseMirror-selectednode {
outline: 1px solid #8cf;
}
</style>
<div>
<!----><div contenteditable="true" role="textbox" dir="auto" data-insertable="true" class="ProseMirror embercom-prosemirror-composer-editor dir-auto"><p class="intercom-interblocks-align-left embercom-prosemirror-composer-block-selected" style="">Hey!Hey!Hey!Hey!Hey!</p><p class="intercom-interblocks-align-left" style=""><br></p></div></div>
<div class="flex flex-row flex-wrap gap-4 embercom-prosemirror-composer-attachment-list">
<!----></div>
<!---->
<!---->
<!---->
<!---->
<!---->
<div></div>
<!---->
<!----></div>
<!---->
<!----></div></div>
If you want to use ember here is a possible solution:
hey = driver.find_element_by_xpath('//*[contains(#id="ember")]/div/div[3]/div[1]/div/p')
hey.send_keys("Hey!")
This will probably fail if there are multiple elements with id="ember[0-9]+".
If you want to access the p tag directly use find_element_by_css_selector, like so:
hey = driver.find_element_by_css_selector('.intercom-interblocks-align-left.embercom-prosemirror-composer-block-selected')
hey.send_keys("Hey!")
Your code with find_element_by_class_name did not work because it's expecting one class name and you are passing two class names (class names are separated by space).
I'm using Selenium for testing. I want to click on an element. The element is very much clickable and visible, but it happens that the middle point of the element is obscured, causing the error.
Here is a MCVE:
HTML code (link to demo):
<style>
button {
width: 90vw;
height: 90vh;
position: fixed;
top: 5vh;
left: 5vw;
}
.cover {
background: grey;
opacity: 0.3;
width: 80vw;
height: 80vh;
position: fixed;
top: 10vh;
left: 10vw;
}
</style>
<button onclick="alert('hi');">
Click me!
</button>
<div class="cover">
I'm in the way!
</div>
Python selenium code:
from selenium import webdriver
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get("https://blissfulpreciouslanservers--five-nine.repl.co/")
button = driver.find_element_by_tag_name("button")
button.click()
Result:
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <button onclick="alert('hi');">...</button> is not clickable at point (451, 450). Other element would receive the click: <div class="cover">...</div>
This seems like a rather sad limitation of Selenium. The button is clickable, just not at all points. I don't want to have to fiddle with scrolling and coordinates.
There are many similar questions about the exception in general, e.g:
Can not click on a Element: ElementClickInterceptedException in Splinter / Selenium
Selenium can't click element because other element obscures it
Element not clickable since another element obscures it in python
However the questions are never specifically about an element that is only partially obscured, so I haven't managed to find a proper answer to my own question. The answers to other questions generally fall into these categories:
Wait until the element is clickable. Doesn't apply here.
Use action chains, e.g. ActionChains(driver).move_to_element(button).click().perform(). This doesn't help because .move_to_element() moves to the middle of the element.
Use JavaScript to perform the click. It seems like I might have to resort to this, but it's very unsatisfactory. I still want to validate that the element is clickable at least somewhere and not bypass all checks.
Have you tried to add a class to your button and then search for the class? For example:
<button class="btn" onclick="alert('hi');">
Click me!
</button>
Then use the following to find and click on that button:
driver.findElement(By.className("btn")).click();
similar to this stack overflow response by alecxe
<iframe id="xyz" src="https://www.XXXXXX.com/" allowfullscreen="yes" style="width: 100%; height: 100%;">
#document
<!DOCTYPE html>
<html>...</html> // a whole new HTML document
</iframe>
I tried the below code, but I am not able to access the inner HTML content. Please guide.
docu=driver.find_element_by_xpath("//*[#id='asdfghg']").find_element_by_tag_name("iframe")
print(docu.get_attribute("innerHTML"))
Not sure if you are particular to an element or need full source , check and upvote if below lines can help you...
from selenium import webdriver
driver = webdriver.Chrome(executable_path="C:\\driver\\chromedriver.exe")
driver.get('https://yoururl')
# HTML Source before getting in frame
print(driver.page_source)
# Switch to Frame
driver.switch_to.frame('yourframeID')
# HTML Source after getting in frame
print(driver.page_source)
I am writing a program to gather data from a website, however at some point in the program I need to enter an email and password into text boxes. The route I am attempting to go is to use driver.execute_script to change the underlying HTML, however I am having difficulty connecting that command with the XPATH value that I use to find the element. I know there are a few threads on here that deal with similar issues, however I have been completely unable to find one that uses XPATH. Any help would be greatly appreciated as I am totally stuck here.
Below is the line of HTML that I am attempting to change, as well as the XPATH value associated with the text box.
XPATH Value
/html/body/center/div[4]/table[2]/tbody/tr/td[4]/table[1]/tbody/tr[2]/td/table/tbody/tr[7]/td[2]/font
HTML:
<input onclick="if (this.value == 'Enter Your Name') this.value='';" onchange="if (this.value == 'Enter Your Name') this.value='';" name="name" type="text" value="Enter Your Name" style="padding-top: 2px; padding-bottom: 6px; padding-left: 4px; padding-right: 4px; width:120px; height:15px; font-size:13px; color: #000000; font-family:Trebuchet MS; background:#FFFFFF; border:1px solid #000000;">
I am attempting to replace value = "Enter Your Name" with value = "Andrew" - or any other name for that matter. Thank you very much for any and all advice, and please let me know there is any additional data / info that is required.
Send_Keys scripts:
name = driver.find_element_by_xpath('//body/center/form/span/table/tbody/tr/td[1]/input')
name.clear()
name.send_keys("Andrew")
Your <input> tag is contained within an <iframe>, so you'll need to switch the context to the <iframe> first:
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
Now that you're "inside" the <iframe>, your send_keys script should work:
name = driver.find_element_by_xpath('//body/center/form/span/table/tbody/tr/td[1]/input')
name.clear()
name.send_keys("Andrew")
Lastly, here's how to switch back to the default content (out of the <iframe>):
driver.switch_to.default_content()
In my code I'm trying to get the first line of text from a webpage into a variable in python. At the moment I'm using urlopen to get the whole page for each link I want to read. How do I only read the first line of words on the webpage.
My code:
import urllib2
line_number = 10
id = (np.arange(1,5))
for n in id:
link = urllib2.urlopen("http://www.cv.edu/id={}".format(n))
l = link.read()
I want to extract the word "old car" from the following html code of the webpage:
<html>
<head>
<link rel="stylesheet">
<style>
.norm { font-family: arial; font-size: 8.5pt; color: #000000; text-decoration : none; }
.norm:Visited { font-family: arial; font-size: 8.5pt; color: #000000; text-decoration : none; }
.norm:Hover { font-family: arial; font-size: 8.5pt; color : #000000; text-decoration : underline; }
</style>
</head>
<body>
<b>Old car</b><br>
<sup>13</sup>CO <font color="red">v = 0</font><br>
ID: 02910<br>
<p>
<p><b>CDS</b></p>
Use XPath. It's exactly what we need.
XPath, the XML Path Language, is a query language for selecting nodes from an XML document.
The lxml python library will help us with this. It's one of many. Libxml2, Element Tree, and PyXML are some of the options. There are many, many, many libraries to do this type of thing.
Using XPath
Something like the following, based on your existing code, will work:
import urllib2
from lxml import html
line_number = 10
id = (np.arange(1,5))
for n in id:
link = urllib2.urlopen("http://www.cv.edu/id={}".format(n))
l = link.read()
tree = html.fromstring(l)
print tree.xpath("//b/text()")[0]
The XPath query //b/text() basically says "get the text from the <b> elements on a page. The tree.xpath function call returns a list, and we select the first one using [0]. Easy.
An aside about Requests
The Requests library is the state-of-the-art when it comes to reading webpages in code. It may save you some headaches later.
The complete program might look like this:
from lxml import html
import requests
for nn in range(1, 6):
page = requests.get("http://www.cv.edu/id=%d" % nn)
tree = html.fromstring(page.text)
print tree.xpath("//b/text()")[0]
Caveats
The urls didn't work for me, so you might have to tinker a bit. The concept is sound, though.
Reading from the webpages aside, you can use the following to test the XPath:
from lxml import html
tree = html.fromstring("""<html>
<head>
<link rel="stylesheet">
</head>
<body>
<b>Old car</b><br>
<sup>13</sup>CO <font color="red">v = 0</font><br>
ID: 02910<br>
<p>
<p><b>CDS</b></p>""")
print tree.xpath("//b/text()")[0] # "Old cars"
If you are going to do this on many different webpages that might be written differently, you might find that BeautifulSoup is helpful.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
As you can see at the bottom of quick start, it should be possible for you to extract all the text from the page and then take whatever line you are interested in.
Keep in mind that this will only work for HTML text. Some webpages use javascript extensively, and requests/BeautifulSoup will not be able to read content provided by the javascript.
Using Requests and BeautifulSoup - Python returns tag with no text
See also an issue I have had in the past, which was clarified by user avi: Want to pull a journal title from an RCSB Page using python & BeautifulSoup