Python Selenium ChromeDriver: chromedriver dummy frame - python

I had this simple login script to facebook that used to work perfectly until about a month ago. But yesterday when I tried running it again I got this dummy page:
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body><pre style="word-wrap: break-word; white-space: pre-wrap;">
</pre>
<iframe name="chromedriver dummy frame" src="about:blank"></iframe>
</body>
</html>
I guess they've added some new detections. Is there a way to avoid those?
This is my simplified code:
browser = webdriver.Chrome(executable_path=path, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])
browser.get("https://www.facebook.com/")
for line in browser.page_source.split('\n'):
print line

I have a similar problem which is not Facebook but our developing pages.
I might be ssl problem. (which might be solved --ignore-ssl-... option.)
Mostly, This is waiting problem.
The Selenium bot captures whole HTML PAGE before the server print out their contexts.
Thus, it might be solved, using same wait options (See this)
If there is some unique ID html elements, please insert following codes:
wait = WebDriverWait(driver, 5)
element = wait.until(EC.visibility_of_element_located((By.ID, 'unique')))

Related

Python Selenium with BeautifulSoup: PHP redirect removes useful information from the URL. How to fix?

I am trying to use Python Selenium with BeautifulSoup to scrape data off a PHP-enabled website.
But the site does an immediate redirect:
<html>
<head>
<meta content="0;url=index.php" http-equiv="refresh"/>
</head>
<body>
<p>Redirecting to TestRail ..</p>
</body>
</html>
... when I just give the URL "https://mysite.thing.com"
When I change it to: "https://mysite.thing.com/index.php" ... I get a 404 error.
How to get around this? Any suggestions appreciated!
I think it's because php requested webpages are generated on the fly with a randomly generated token, thereby going directly to the index.php will take you know here because your 'token' is nil, I would go through the motions in selenium to navigate the page as if you were doing it instead of trying to skip ahead.
I could be totally wrong about the php thing BTW, it's a vague memory....
It worked to use this simpler code:
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
urlpage = "https://my.site.com"
print(urlpage)
driver.get(urlpage)
html = driver.page_source
print(html)
This follows the redirect and does what I expect.

Selenium raw page source

I am trying to get the source code of a particular site with the help of Selenium with:
Python code:
driver.page_source
But it returns it after it has been encoded.
The raw file:
<html>
<head>
<title>AAAAAAAA</title>
</head>
<body>
</body>
When press 'View page source' inside Chrome, I saw the correct source raw without encoding.
How can this be achieved?
You can try using Javascript instead of Python builtin code to get the page source.
javascriptPageSource = driver.execute_script("return document.body.outerHTML;")

Selenium not working except launching browser and open the page

I see that my selenium cannot execute codes except to launch Chrome.
I don't know why my selenium is not working. It just open the browser (Chrome) with the URL and then doing nothing even to maximize the window, not even inserting the form.
Is there anything wrong with my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re, time, csv
driver = webdriver.Chrome("C:\\Users\\Ashraf%20Misran\\Installer\\chromedriver.exe")
driver.get("file:///C:/Users/Ashraf%20Misran/Devs/project-html/learning-html/selenium sandbox.html")
driver.maximize_window()
username = driver.find_element_by_xpath(".//input")
username.click()
username.send_keys("000200020002")
The page I opened is coded as below:
<!DOCTYPE html>
<html>
<head>
<title>Sandbox</title>
</head>
<body>
<form>
<input type="text" name="username">
</form>
</body>
</html>
I think the problem is with web-page, you are trying to open. Would suggest to try first with simple test, like Open Google page, enter something in search field. With this you will be able to verify, if you correctly implemented driver initialization.
Update: try to use this css selector: input[name='username'], if page is loaded correctly, then you have a problem with your web element selector.
I think, there is a problem with using relative xpath locator. Please try that one:
username = driver.findElement(By.xpath("//input"))

Scraping dynamic data with Python

I have this simple page on html:
<html>
<body>
<p>Javascript (dynamic data) test:</p>
<p class='jstest' id='yesnojs'>Hello</p>
<button onclick="myFunction()">Try it</button>
<script>
function myFunction() {
document.getElementById('yesnojs').innerHTML = 'GoodBye';
}
</script>
</body>
</html>
I would like now scrap this page using Python to get when the id "yesnojs" is "GoodBye", I mean, when the user has clicked the button. I have been trying some tutorials but I always get "Hello", it doesn´t care if I have click and I am viewing on the page "GoodBye".
I hope your help, thank you.
PD:
this is my code on Python for try scrape the page:
from selenium import webdriver
chrome_path=
"C:\\Users\\Antonio\\Downloads\\chromedriver_win32\\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://localhost/templates/scraping.html")
review = driver.find_elements_by_class_name("jstest")
for post in review:
print(post.text)
Selenium does not attach to your existing open web pages. It opens a new web page. You would have to simulate clicking with Selenium if you're designing a unit test.
Alternatively, are you looking at making a browser extension that does the scraping when this event happens, Selenium is not the tool for this.

Python selenium: Scrolling inside iframe

Hi, I am able to switch between tabs, access all elements. I am unable to scroll in this iframe. Script executes without error. But scrolling is not happening. Please help. Code I am using is as follows.
# switching to iframe
iframe = self.browser.find_elements_by_tag_name('iframe')[0]
self.browser.switch_to_frame(iframe)
time.sleep(1)
#clicking tab 4
self.force_click('xpath=/html/body/div/md-content/md-tabs/md-tabs-wrapper/md-tabs-canvas/md-pagination-wrapper/md-tab-item[4]/span')
time.sleep(4)
#scrolling
elm = self.browser.find_elements_by_tag_name('html')
elm[0].send_keys(Keys.END)
HTML of the iframe is as follows.
<iframe id="widget-iframe" class="widget-iframe" frameborder="0" ap-onunload="vm.onFrameUnload()" ap-onload="vm.onFrameLoad()" ng-src="/apps/launchpad-view-widget/" src="/apps/launchpad-view-widget/">
<!DOCTYPE html>
<html class="ng-scope" ng-app="launchpadViewWidget">
<head>
<body>
</html>
</iframe>
I propose you to set window size bigger, so you wont have to scroll
self.browser.set_window_size(1920,1080)#(4096, 3112) <-4k resolution(if needed)
--
Edit:
Also can get the HTML code from browser and manage it with BeautifulSoup
html_source = browser.page_source
For python2.x i recomend
html_source = browser.page_source.encode('utf-8')
Then find the table that you want without care about scrolling.
I've run into this before. I couldn't see the values past the scroll area. The solution was to get the javascript object that populates the table via JSON. You can do this using the javascriptexecutor: driver.execute_script
Hope that helps!

Categories

Resources