Python Selenium WebDriverException Empty HTML

Python Selenium WebDriverException Empty HTML - python

My code:
from selenium import webdriver
from selenium.webdriver.chome.service import Service
broswer = webdriver.Chrome(service=Service(R"C:\ProgramData\Google\ChromeDriver\chromedriver.exe")
At the last line, I get a WebDriverException with the following details:
Message: <!DOCTYPE html>
<html xmlsn="http://www.w3.org/1999/xhtml">
<body>
</body>
</html>
Although it worked perfectly yesterday, for some reason it doesn't work today and the error message isn't really helpful to me. I've tried many different ChromeOptions, as well as with Edge and the EdgeDriver, but it's the same result with that empty HTML result and error. I've done a lot of Googling but can't find any solution that works, as every other post has a more detailed error to troubleshoot.

Related

Python Selenium with BeautifulSoup: PHP redirect removes useful information from the URL. How to fix?

I am trying to use Python Selenium with BeautifulSoup to scrape data off a PHP-enabled website.
But the site does an immediate redirect:
<html>
<head>
<meta content="0;url=index.php" http-equiv="refresh"/>
</head>
<body>
<p>Redirecting to TestRail ..</p>
</body>
</html>
... when I just give the URL "https://mysite.thing.com"
When I change it to: "https://mysite.thing.com/index.php" ... I get a 404 error.
How to get around this? Any suggestions appreciated!

I think it's because php requested webpages are generated on the fly with a randomly generated token, thereby going directly to the index.php will take you know here because your 'token' is nil, I would go through the motions in selenium to navigate the page as if you were doing it instead of trying to skip ahead.
I could be totally wrong about the php thing BTW, it's a vague memory....

It worked to use this simpler code:
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
urlpage = "https://my.site.com"
print(urlpage)
driver.get(urlpage)
html = driver.page_source
print(html)
This follows the redirect and does what I expect.

Selenium raw page source

I am trying to get the source code of a particular site with the help of Selenium with:
Python code:
driver.page_source
But it returns it after it has been encoded.
The raw file:
<html>
<head>
<title>AAAAAAAA</title>
</head>
<body>
</body>
When press 'View page source' inside Chrome, I saw the correct source raw without encoding.
How can this be achieved?

You can try using Javascript instead of Python builtin code to get the page source.
javascriptPageSource = driver.execute_script("return document.body.outerHTML;")

Selenium not working except launching browser and open the page

I see that my selenium cannot execute codes except to launch Chrome.
I don't know why my selenium is not working. It just open the browser (Chrome) with the URL and then doing nothing even to maximize the window, not even inserting the form.
Is there anything wrong with my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re, time, csv
driver = webdriver.Chrome("C:\\Users\\Ashraf%20Misran\\Installer\\chromedriver.exe")
driver.get("file:///C:/Users/Ashraf%20Misran/Devs/project-html/learning-html/selenium sandbox.html")
driver.maximize_window()
username = driver.find_element_by_xpath(".//input")
username.click()
username.send_keys("000200020002")
The page I opened is coded as below:
<!DOCTYPE html>
<html>
<head>
<title>Sandbox</title>
</head>
<body>
<form>
<input type="text" name="username">
</form>
</body>
</html>

I think the problem is with web-page, you are trying to open. Would suggest to try first with simple test, like Open Google page, enter something in search field. With this you will be able to verify, if you correctly implemented driver initialization.
Update: try to use this css selector: input[name='username'], if page is loaded correctly, then you have a problem with your web element selector.

I think, there is a problem with using relative xpath locator. Please try that one:
username = driver.findElement(By.xpath("//input"))

Python Selenium ChromeDriver: chromedriver dummy frame

I had this simple login script to facebook that used to work perfectly until about a month ago. But yesterday when I tried running it again I got this dummy page:
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body><pre style="word-wrap: break-word; white-space: pre-wrap;">
</pre>
<iframe name="chromedriver dummy frame" src="about:blank"></iframe>
</body>
</html>
I guess they've added some new detections. Is there a way to avoid those?
This is my simplified code:
browser = webdriver.Chrome(executable_path=path, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])
browser.get("https://www.facebook.com/")
for line in browser.page_source.split('\n'):
print line

I have a similar problem which is not Facebook but our developing pages.
I might be ssl problem. (which might be solved --ignore-ssl-... option.)
Mostly, This is waiting problem.
The Selenium bot captures whole HTML PAGE before the server print out their contexts.
Thus, it might be solved, using same wait options (See this)
If there is some unique ID html elements, please insert following codes:
wait = WebDriverWait(driver, 5)
element = wait.until(EC.visibility_of_element_located((By.ID, 'unique')))

Unable to replicate examples of running ironpython in browser with silverlight

Basically i am following this tutorial: http://blog.jimmy.schementi.com/2010/03/pycon-2010-python-in-browser.html
According to it, this code should run fine:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<script type="text/javascript"
src="http://gestalt.ironpython.net/dlr-20100305.js"></script>
<script type="text/python" src="http://github.com/jschementi/pycon2010/raw/master/repl.py"></script>
</head>
<body>
<script type="text/python">
window.Alert("Hello from Python!")
</script>
</body>
</html>
And in fact, it does, for example here: http://ironpython.net/browser/examples/pycon2010/start.html
You will see it if you have silverlight installed.
But the problem is that when I try to make the same code run on my PC, I can't do it. I create a text file, copy this code there, save it as test.html, and run with firefox, but nothing happens. Code does not execute, i just get a blank page.
I can't understand the reason why the same code runs here: http://ironpython.net/browser/examples/pycon2010/start.html, but not on my PC, given that it is a client side code, and not the server side.

It's failing to download repl.py; looks like a bug as it's falling back to the DOM downloader when doing cross-domain downloads, but throws. As a work-around copy it to your web-server as well; here's it working: http://www.schementi.com/silverlight/Sunny88.html.
Also, locally you must run under a local web-server as Silverlight isn't able to download any files from the http:// zone while running from the file:// zone.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium WebDriverException Empty HTML - python

Related

Python Selenium with BeautifulSoup: PHP redirect removes useful information from the URL. How to fix?

Selenium raw page source

Selenium not working except launching browser and open the page

Python Selenium ChromeDriver: chromedriver dummy frame

Unable to replicate examples of running ironpython in browser with silverlight

Categories

Resources