Python Selenium On Local HTML String

Python Selenium On Local HTML String - python

I am trying to run Selenium on a local HTML string but can't seem to find any documentation on how to do so. I retrieve HTML source from an e-mail API, so Selenium won't be able to parse it directly. Is there anyway to alter the following so that it would read the HTML string below:
Python Code for remote access:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_class_name("q")
Local HTML Code:
s = "<body>
<p>This is a test</p>
<p class="q">This is a second test</p>
</body>"

If you don't want to create a file or load a URL before being able to replace the content of the page, you can always leverage the Data URLs feature, which supports HTML, CSS and JavaScript:
from selenium import webdriver
driver = webdriver.Chrome()
html_content = """
<html>
<head></head>
<body>
<div>
Hello World =)
</div>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))

If I understand the question correctly, I can imagine 2 ways to do this:
Save HTML code as file, and load it as url file:///file/location. The problem with that is that location of file and how file is loaded by a browser may differ for various OSs / browsers. But implementation is very simple on the other hand.
Another option is to inject your code onto some page, and then work with it as a regular dynamic HTML. I think this is more reliable, but also more work. This question has a good example.

Here was my solution for doing basic generated tests without having to make lots of temporary local files.
import json
from selenium import webdriver
driver = webdriver.PhantomJS() # or your browser of choice
html = '''<div>Some HTML</div>'''
driver.execute_script("document.write('{}')".format(json.dumps(html)))
# your tests

If I am reading correctly you are simply trying to get text from an element. If that is the case then the following bit should fit your needs:
elem = driver.find_element_by_class_name("q").text
print elem
Assuming "q" is the element you need.

Related

Get all <thspan> contents in Python Selenium

Say that I have a piece of HTML code that looks like this:
<html>
<body>
<thspan class="sentence">He</thspan>
<thspan class="sentence">llo</thspan>
</body>
</html>
And I wanted to get the content of both and connect them into a string in Python Selenium.
My current code looks like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
thspans = browser.find_elements(By.CLASS_NAME, "sentence")
context = ""
for thspan in thspans:
context.join(thspan.text)
The code can run without any problem, but the context variable doesn't contain anything. How can I get the content of both and connect them into a string in Python Selenium?

context += thspan.text instead of using context.join(thspan.text) just like #Rajagopalan said

Get all <thspan> contents in Python Selenium
Hi! You were not redirecting the browser to the page you actually want to scrap the data from. And you were misusing the function join. Here is a code that will work for you:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
# Put the absolute path to your html file if you are working locally, or
# the URL of the domain you want to scrap
browser.get('file:///your/absolute/path/to/the/html/code/index.html')
thspans = browser.find_elements(By.CLASS_NAME, "sentence")
context = ''
print('thspans', thspans, end='\n\n')
for thspan in thspans:
context += thspan.text
print(context)
Good luck!

Use this line without the loop:
context = "".join([thspan.text for thspan in thspans])

Selenium not working except launching browser and open the page

I see that my selenium cannot execute codes except to launch Chrome.
I don't know why my selenium is not working. It just open the browser (Chrome) with the URL and then doing nothing even to maximize the window, not even inserting the form.
Is there anything wrong with my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re, time, csv
driver = webdriver.Chrome("C:\\Users\\Ashraf%20Misran\\Installer\\chromedriver.exe")
driver.get("file:///C:/Users/Ashraf%20Misran/Devs/project-html/learning-html/selenium sandbox.html")
driver.maximize_window()
username = driver.find_element_by_xpath(".//input")
username.click()
username.send_keys("000200020002")
The page I opened is coded as below:
<!DOCTYPE html>
<html>
<head>
<title>Sandbox</title>
</head>
<body>
<form>
<input type="text" name="username">
</form>
</body>
</html>

I think the problem is with web-page, you are trying to open. Would suggest to try first with simple test, like Open Google page, enter something in search field. With this you will be able to verify, if you correctly implemented driver initialization.
Update: try to use this css selector: input[name='username'], if page is loaded correctly, then you have a problem with your web element selector.

I think, there is a problem with using relative xpath locator. Please try that one:
username = driver.findElement(By.xpath("//input"))

Unable to fetch information as 3rd party browser plugin is blocking JS to work

I wanted to extract data from https://www.similarweb.com/ but when I run my code it shows (converted the output of HTML into text):
Pardon Our Interruption http://cdn.distilnetworks.com/css/distil.css" media="all" /> http://cdn.distilnetworks.com/images/anomaly-detected.png" alt="0" />
Pardon Our Interruption...
As you were browsing www.similarweb.com something about your browser made us think you were a bot. There are a few reasons this might happen:
You're a power user moving through this website with super-human speed.
You've disabled JavaScript in your web browser.
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .
After completing the CAPTCHA below, you will immediately regain access to www.similarweb.com.
if (!RecaptchaOptions){ var RecaptchaOptions = { theme : 'blackglass' }; }
You reached this page when attempting to access https://www.similarweb.com/ from 14.139.82.6 on 2017-05-22 12:02:37 UTC.
Trace: 9d8ae335-8bf6-4218-968d-eadddd0276d6 via 536302e7-b583-4c1f-b4f6-9d7c4c20aed2
I have written the following piece of code:
import urllib
from BeautifulSoup import *
url = "https://www.similarweb.com/"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
print (soup.prettify())
# tags = soup('a')
# for tag in tags:
# print 'TAG:',tag
# print tag.get('href', None)
# print 'Contents:',tag.contents[0]
# print 'Attrs:',tag.attrs
Can anyone help me as to how I can extract the information?

I tried with requests; it failed. selenium seems to work.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://www.similarweb.com/')

Download html of a webpage thats already loaded

I am writing a program using Python and selenium to automate logging into a website. The website asks a security question for additional verification. Clearly the answer I would send using "send_keys" would depend on the question asked so I need to figure out what is being asked based on the text. BeautifulSoup can be used to parse through the HTML but in all the examples I have seen you have to give a URL to then read the page content. How do I read the content of a page that's already open? The code I am using is:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
chromedriver = 'C:\\Program Files\\Google\\chromedriver.exe'
browser = webdriver.Chrome(chromedriver)
browser.get('http://www.aaaa.com')
loginElem = browser.find_element_by_id('bbbb')
loginElem.send_keys('cccc')
passwordElem = browser.find_element_by_id('dddd')
passwordElem.send_keys('eeee')
passwordElem.send_keys(Keys.RETURN)
The page with the security questions loads after this and that's the page I want the URL of.
I also tried finding by element but for some reason it wasnt working which is why I am trying a workaround. Below is the HTML for the entire div class where the question is. Alternatively maybe you can help me search for the right one.
<div class="answer-section">
<p> Please answer your challenge question so we can help
verify your identity.
</p> <label for="tlpvt-challenge-answer"> What is the name of your dog?
</label>
<input type="text" id="tlpvt-challenge-answer" class="tl-private gis- mask"
name="challengeQuestionAnswer" value=""/>
</div>

well if you want to use BeautifulSoup you can retrieve the source code from the webdriver and then parse it:
chromedriver = 'C:\\Program Files\\Google\\chromedriver.exe'
browser = webdriver.Chrome(chromedriver)
browser.get('http://www.aaaa.com')
# call page_source attr from a webdriver instance to
# retrieve HTML source code
html = browser.page_source
# parse it with BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
label = soup.find('label', {'for': 'tlpvt-challenge-answer'})
print label.get_text()
output:
$ What is the name of your dog?

Python Selenium Webdriver: finding #document element

I have been using Python's Selenium Webdriver getting elements with this HTML code.
However, I could not access any of the elements inside this #document tag.
I used both
driver.find_element_by_xpath("html/body/div[#id='frame']/iframe/*"), and I tried
elem = driver.find_element_by_tag("iframe"), following by
elem.find_element_by_xpath to find inner elements but failed.
I also tried to do driver.switch_to_frame(driver.find_element_by_tag("iframe")), following with xpath expressions to find inner elements, but it also did not work.
Frame:
<div>
<iframe>
#document
<html>
<body>
<div>
....
</div>
</body>
</html>
</iframe>
</div>

Switching to the iframe and then using the normal query methods is the correct approach to use. I use it successfully throughout a large test suite.
Remember to switch back to the default content when you've finished working inside the iframe though.
Now, to solve your problem. How are you serving the contents of the iframe? Have you literally just written the html and saved it to a file or are you looking at an example site. You might find that the iframe doesn't actually contain the content you expect. Try this.
from selenium.webdriver import Firefox
b = Firefox()
b.get('localhost:8000') # or wherever you are serving this html from
iframe = b.find_element_by_css_selector('iframe')
b.switch_to_frame(iframe)
print b.page_source
That will be the html inside the iframe. Is the contents what you expect? Or is it mainly empty. If it's empty then I suspect it's because you need to serve the contents of the iframe separately.

Web application developers are not very fond of iframes in general. As per their suggestion, I added a 'wait' time using Expected Conditions . After that you can fetch values of your tags. Here I have mentioned as val1.
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
.... #some code
.... #some code
wait(driver, 60).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,'//*[#id="iframeid"]')))
.... #some code
.... #some code
val1 = wait(browser, 20).until(
EC.presence_of_element_located((By.XPATH,'//tr[(#cid="1")]/td[#ret="2" and #c="21"]')))
Hope this helps !

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium On Local HTML String - python

If I am reading correctly you are simply trying to get text from an element. If that is the case then the following bit should fit your needs: elem = driver.find_element_by_class_name("q").text print elem Assuming "q" is the element you need.

Related

Get all <thspan> contents in Python Selenium

Selenium not working except launching browser and open the page

Unable to fetch information as 3rd party browser plugin is blocking JS to work

Download html of a webpage thats already loaded

Python Selenium Webdriver: finding #document element

Categories

Resources