This is my first foray into Selenium. Apologies in advance if this is a stupid/trivial question.
I am trying to scrape information from a webpage. With Python/Selenium I am able to log on to the site and get to the page with the information I need. After the page I need is displayed, I am issuing
html_source = driver.page_source
print html_source
The "source" that gets printed is different from both the
right click and select view page source and
right click and select This Frame, View Frame source
The required information is in the View Frame source. All of this is in Firefox.
What do I need to do to get to the Frame Source? There is no frame name in the Frame Source.
Additional information below:
When I right click and select view page source I get the below:
<!DOCTYPE html><html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>xxxxxxx Portal</title>
<base href="">
<link rel="shortcut icon" href="images/logos/xxxxxxx.ico">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1"><script type="text/javascript" src=""> </script><script type="text/javascript" src=""> </script><script>
function pushFocus()
function addInProgressPanel(doc)
var d = doc.createElement('div');"inProgressPane";
var tbl = doc.createElement("table");
var row = tbl.insertRow(-1);
var oi = doc.createElement("img");
oi.src= ''+ "images/actions/loading2.gif";
var td = doc.createElement("td");
td = doc.createElement("td");
td.appendChild(doc.createTextNode("Your Request is Being Processed ..."));
return d;
function inProgressScreen(type)
var ws = frames["frameDetail"];
if(!ws) return true;
var ips = ws.document.getElementById("inProgressPane");
if(type) ips.className = 'freezeOn';
else ips.className = 'freezeOff';
}else if(type)
ips = addInProgressPanel(ws.document);
<frameset id="main" framespacing="0" frameborder="0">
<frame id="frameDetail" name="frameDetail" scrolling="auto" marginwidth="0" marginheight="0" src="portal/portal.xsl?x=portal.PortalOutline&lang=en&mode=notices">
When I right click and select This Frame, View Frame source I get
<!DOCTYPE html><html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<base href="">
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<title>xxxxxxxx Portal</title>
<link rel="stylesheet" type="text/css" href="styles/portal/menu.css">
<link rel="stylesheet" type="text/css" href="styles/portal/header.css">
<link rel="stylesheet" type="text/css" href="styles/portal/footer.css">
<link rel="stylesheet" type="text/css" href="styles/portal/jquery-ui-1.8.7.portal.css">
<link rel="stylesheet" type="text/css" href="styles/portal/">
<link rel="stylesheet" type="text/css" href="styles/portal/portal.css">
<link rel="stylesheet" type="text/css" href="styles/icons.css">
<link rel="stylesheet" type="text/css" href="styles/portal/notifications.css"><script type="text/javascript" src=""> </script><script type="text/javascript" src=""> </script><script src="scripts/widgets/common.js"></script><script src="scripts/controller.js"></script><script src="scripts/portal.js"></script><script src="scripts/jquery/jquery-1.7.2.min.js"></script><script type="text/javascript" src=""> </script><script src="scripts/jquery/jquery-ui-1.8.16.min.js"></script><script src="scripts/jquery/"></script><script src="portal/lang/datePickerLanguage.jsp?lang=en"></script><script src="portal/portal.js"></script><script src="portal/portalNoShim.js"></script><script>
Lots more code here. Did not paste as it was too long. There is no frame name other than the reference to iSessionFrame below:
</script><script language="javascript" src="portal/grades.js"></script></div>
<div id="footer">
<table id="language"><select id="locale" style="width:175px"></select></table>
</div><iframe id="iSessionFrame" name="iSessionFrame" width="0" height="0" src="" style="visibility:hidden;"></iframe></body>
Q: What do I need to do to get to the Frame Source?
A: First you must switch to the wanted frame using the switch_to command and then you should use .page_source to get the html source.
Obs.: take a look at Selenium Docs, more specifically at Moving between windows and frames.
You could try to switch to the frame using its ID :
I'm trying to login whalewisdom website for last two week but I'm not able to log in, I was tried many libraries like scrapy, selenium, beautifulsoup, etc...
from requests import Session
from bs4 import BeautifulSoup as bs
with Session() as s:
login_url = s.get("")
bs_content = bs(login_url.content, "lxml")
authenticity_token = bs_content.find("input", {"name":"authenticity_token"})["value"]
login_data = {
"authenticity_token": authenticity_token,
"login": "",
"password": "***********",
"commit": "Log+In",
}"", data=login_data)
html_data = bs(s.get("").content, "html.parser")
enter image description here
Here the outputenter image description here:
<!DOCTYPE html>
<html lang="en">
<head>[enter image description here][1]
<meta charset="utf-8"/>
<title>WhaleWisdom Dashboard</title>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width,initial-scale=1.0" name="viewport"/>
<meta content="WhaleWisdom tracks 13F, Schedule 13D, and 13G EDGAR filings by hedge funds. Hedge Fund Whale Backtesting and search tools" name="description"/>
<link href="" rel="apple-touch-icon" sizes="76x76"/>
<link href="" rel="icon" sizes="32x32" type="image/png"/>
<link href="" rel="icon" sizes="96x96" type="image/png"/>
<link href="" rel="icon" sizes="16x16" type="image/png"/>
<meta content="r4hQnHlN2H-GtcIb06YHl49VSipApmfQQWIOvZzfnAU" name="google-site-verification">
<link href=",300,400,500,700|Material+Icons" rel="stylesheet" type="text/css"/>
<link href="" rel="stylesheet"/>
<link href="" media="screen" rel="stylesheet">
<meta content="authenticity_token" name="csrf-param">
<meta content="XMAu/LK+dKi/zt/XSTvxIJ8jKl2x8Rx47/ZnAiN6MQCcZmSSlUrOLMeURRr54eCfEWHY8oyS8c6GYxLoIMomNQ==" name="csrf-token">
<strong>We're sorry but the WhaleWisdom Dashboard doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
<div id="app"></div>
<script src=""></script>
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
ga('create', 'UA-11651599-1', 'auto');
ga('send', 'pageview');
<script async="" charset="utf-8" src="//" type="text/javascript"></script>
You can see that in the HTML output, at line 23, there is an error type message stating that the WhaleWisdom dashboard doesn't work properly without JavaScript.
<!DOCTYPE html>
<html lang="en">
<head>[enter image description here][1]
<meta charset="utf-8"/>
<title>WhaleWisdom Dashboard</title>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width,initial-scale=1.0" name="viewport"/>
<meta content="WhaleWisdom tracks 13F, Schedule 13D, and 13G EDGAR filings by hedge funds. Hedge Fund Whale Backtesting and search tools" name="description"/>
<link href="" rel="apple-touch-icon" sizes="76x76"/>
<link href="" rel="icon" sizes="32x32" type="image/png"/>
<link href="" rel="icon" sizes="96x96" type="image/png"/>
<link href="" rel="icon" sizes="16x16" type="image/png"/>
<meta content="r4hQnHlN2H-GtcIb06YHl49VSipApmfQQWIOvZzfnAU" name="google-site-verification">
<link href=",300,400,500,700|Material+Icons" rel="stylesheet" type="text/css"/>
<link href="" rel="stylesheet"/>
<link href="" media="screen" rel="stylesheet">
<meta content="authenticity_token" name="csrf-param">
<meta content="XMAu/LK+dKi/zt/XSTvxIJ8jKl2x8Rx47/ZnAiN6MQCcZmSSlUrOLMeURRr54eCfEWHY8oyS8c6GYxLoIMomNQ==" name="csrf-token">
<strong>We're sorry but the WhaleWisdom Dashboard doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>**
<div id="app"></div>
<script src=""></script>
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
ga('create', 'UA-11651599-1', 'auto');
ga('send', 'pageview');
<script async="" charset="utf-8" src="//" type="text/javascript"></script>
I think because of this it is not working. I also can't test it right now because I don't use WhaleWisdom.
I am trying to write a Python script to scrape data from this webpage. I am trying to scrape the data from the second table ('class': 'char-pico-table') and am using this script to do so:
def getPICO(url):
r = requests.get(url)
print (r.content)
However, this prints this:
b'<!DOCTYPE html>\n<html class="view">\n <head>\n <title>RobotReviewer: Automating evidence synthesis</title>\n <meta charset="utf-8">\n <meta name="viewport" content="width=device-width, initial-scale=1.0">\n <meta name="google" content="notranslate">\n\n <link rel="stylesheet" type="text/css" href="//">\n <link rel="stylesheet" type="text/css" href="/css/main.css">\n <link rel="stylesheet alternative prefetch" type=text/css href="/css/report.css">\n\n <!-- Preload examples -->\n <link rel="prefetch" href="/report_view/Tvg0-pHV2QBsYpJxE2KW-/html">\n <link rel="prefetch" href="/report_view/_fzGUEvWAeRsqYSmNQbBq/html">\n <link rel="prefetch" href="/report_view/HBkzX1I3Uz_kZEQYeqXJf/html">\n\n <!-- / Preload examples -->\n\n\n <script src="/scripts/modernizr.js"></script>\n <script src="/scripts/spa/scripts/vendor/pdfjs/pdf.js"></script>\n <script src="/scripts/spa/scripts/vendor/compatibility.js"></script>\n <script data-main="/scripts/main" src="/scripts/require.js"></script>\n\n <script>\n PDFJS.disableWebGL = false;\n CSRF_TOKEN = "1508009356##6a03b1bf519972b27a0d871ae4823eb3a3366c0c";\n </script>\n </head>\n\n <body>\n <nav id="top-bar" class="top-bar" data-topbar role="navigation">\n <div>\n <ul class="title-area">\n <li class="name">\n <h1><img src="/img/logo.svg" width="190px"></h1>\n </li>\n </ul>\n\n <section class="top-bar-section">\n <ul class="right">\n <li>About</li>\n </ul>\n </section>\n </div>\n </nav>\n\n <div id="breadcrumbs"></div>\n\n <main id="main"></main>\n\n\n </body>\n</html>'
which is not the output that I see when I view the page in my browser - it contains none of the data that I wish to scrape. Why is this not the case?
When viewing the page in a web browser it looks like this:
Expected Output
Based on the comment from #Shahin, I wrote the following code, which gave me the data in a JSON format from which I was easily able to extract the data.
result = json.loads(requests.get(''+id+'/json').content)
I'm trying to use an external css file in my html file.
At first I used bootstrap framework and it works well.
However, when I tried to customize the web page by adding a customized css file, it doesn't work at all!
Here is my code:
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="">
<script src=""></script>
<script src=""></script>
<link rel="stylesheet" href="custom.css" type="text/css">
background-color: #9acfea;}
Here I just want to change the background color.
'custom.css' is under the same path with the HTML file.
Also, I've tried to only apply 'custom.css', so I create a new HTML file:
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<link rel="stylesheet" href="custom.css" type="text/css"/>
It doesn't work either.
I'm confused. Why the bootstrap css file works perfect but the customized file doesn't?
By the way, I'm using the Flask framework, but I don't think it matters.
Any suggestions would be appreciate!
<link href="{{ url_for('static', filename='custom.css') }}" rel="stylesheet"/>
I'm new with web scraping and I encountered a problem.
I tried to extract the list of the states from this site, '', by using Python, selenium and PhantomJS but I failed with the output as below.
<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=11;chrome=1">
<style type="text/css">html, body {height:100%;margin:0;}</style>
<link rel="shortcut icon" type="image/" href="./../VAADIN/themes/obp/favicon.ico">
<link rel="icon" type="image/" href="./../VAADIN/themes/obp/favicon.ico">
<link rel="stylesheet" type="text/css" href="./../VAADIN/themes/obp/styles.css"><script type="text/javascript" src="./../VAADIN/widgetsets/org.iso.obp.ui.widgetset.applicationWidgetset/org.iso.obp.ui.widgetset.applicationWidgetset.nocache.js?1444641834593"></script><script src=""></script></head>
<body scroll="auto" class=" v-generated-body">
<div id="obpui-105541713" class=" v-app obp">
<div class=" v-app-loading"></div>
You have to enable javascript in your browser to use an application built with Vaadin.
<script type="text/javascript" src="./../VAADIN/vaadinBootstrap.js"></script>
<script type="text/javascript">//<![CDATA[
if (!window.vaadin) alert("Failed to load the bootstrap javascript: ./../VAADIN/vaadinBootstrap.js");
vaadin.initApplication("obpui-105541713",{"heartbeatInterval":300,"versionInfo":{"vaadinVersion":"7.3.10"},"vaadinDir":"./../VAADIN/","authErrMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC to continue.","caption":"Authentication problem"},"widgetset":"org.iso.obp.ui.widgetset.applicationWidgetset","theme":"obp","comErrMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC to continue.","caption":"Communication problem"},"serviceUrl":".","standalone":true,"sessExpMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC key to continue.","caption":"Session Expired"}});
My code in Python is here.
from selenium import webdriver
target_url = ''
driver = webdriver.PhantomJS()
driver.get( target_url)
print driver.page_source
Is there any solution for this?
I am facing a very strange problem.When I open a web-page which is generated by Django using below template, I get an extra line which is not part of the template that I am using.
I have tried opening the page in IE,Firefox and Chrome, and getting the extra line everywhere.
My Template
<!DOCTYPE html>
<html lang="en">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="{{ STATIC_SERVER_URL }}/static/env_rooms/chat.css" type="text/css"/>
<script src="//"></script>
<script src="{{ STATIC_SERVER_URL }}/static/env_rooms/chat.js" type="text/javascript"></script>
In my browser:
<!DOCTYPE html>
<html lang="en">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="http://indlin232:9000/static/env_rooms/chat.css" type="text/css"/>
<script src="//"></script>
<script src="http://indlin232:9000/static/env_rooms/chat.js" type="text/javascript"></script>
<script type="text/javascript" src="" ></script></head>
Any idea from where is this coming ?
<script type="text/javascript" src="" ></script>
Django Version :- 1.6.2
Python Version :- 2.7.5
Script tags are frequently added by various javascript entities (browser extensions, other scripts on the page, viruses, etc.). I can't look up what that particular script is associated with because of my office's firewall, but I would suspect that one of those two things. Joran Beasley seems to have found that it's a virus on your machine.