I have been trying to get an embedded Power BI report hosted on a flask page. When the page attempts to load, the console throws CORS errors left and right1 and it seems as though the packets carrying the data are getting blocked.2 I have tried setting 'Access-Control-Allow-Origin' to
a wildcard in our response header, but it doesn't seem to solve the issue. However we believe it is at least related to headers because when we try to run it on a version of chrome with security disabled, the report will load as it should. Another unit that works parallel to me has been able to get the same(ish) code working on a C# hosted site and they didn't need to add any origin headers or anything so I'm fairly certain this is an issue specific to flask/power bi. Here is the relevant code from our route serving file:
#app.route('/pm/getembedinfo', methods=['GET'])
def get_embed_info():
'''Returns report embed configuration'''
config_result = utils.Utils.check_config(app)
if config_result is not None:
return json.dumps({'errorMsg': config_result}), 500
try:
embed_info = pbiembedservice.PbiEmbedService().get_embed_params_for_single_report(app.config['WORKSPACE_ID'], app.config['REPORT_ID'])
# embed_info.headers.add('Access-Control-Allow-Origin', '*')
response = make_response(embed_info)
response.headers['Access-Control-Allow-Origin'] = '*'
return response
except Exception as ex:
return json.dumps({'errorMsg': str(ex)}), 500
#app.route('/pm/favicon.ico', methods=['GET'])
def getfavicon():
'''Returns path of the favicon to be rendered'''
return send_from_directory(os.path.join(app.root_path, 'static'), 'img/favicon.ico', mimetype='image/vnd.microsoft.icon')
#app.route('/pm/power_bi_dashboard', methods=["GET"])
def power_bi_dashboard():
response = make_response(render_template('power_bi_dashboard.html'))
response.headers.add('Access-Control-Allow-Origin', '*')
return response
And here is the relevant part of the html template:
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap#4.5.3/dist/css/bootstrap.min.css" integrity="sha384-TX8t27EcRE3e/ihU7zmQxVncDAy5uIKz4rEkgIXeMed4M0jlfIDPvg6uqKI2xXr2" crossorigin="anonymous">
<link rel="stylesheet" href="{{ url_for('static', filename='css/index.css') }}">
<link rel="stylesheet" href="{{ url_for('static', filename='css/global.css') }}">
<title>Traffic Monitoring Dashboard</title>
</head>
<body>
<header class="embed-container col-lg-12 col-md-12 col-sm-12 shadow">
<p>
Traffic Monitoring Dashboard
</p>
</header>
<main class="row">
<section id="report-container" class="embed-container col-lg-offset-4 col-lg-7 col-md-offset-5 col-md-7 col-sm-offset-5 col-sm-7 mt-5">
</section>
<!-- Used to display report embed error messages -->
<section class="error-container m-5">
</section>
</main>
<script src="https://code.jquery.com/jquery-3.5.1.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap#4.5.3/dist/js/bootstrap.min.js" integrity="sha384-w1Q4orYjBQndcko6MimVbzY0tgp4pWB4lZ7lr30WKz0vr/aWKhXdBNmNb5D92v7s" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/powerbi-client/2.15.1/powerbi.min.js" integrity="sha512-OWIl8Xrlo8yQjWN5LcMz5SIgNnzcJqeelChqPMIeQGnEFJ4m1fWWn668AEXBrKlsuVbvDebTUJGLRCtRCCiFkg==" crossorigin="anonymous"></script>
<script src="{{ url_for('static', filename='js/index.js') }}"></script>
</body>
While working on finding how data is being processed on the webpage. I was figuring out this site investorscout.co/investors.
I tried looking at the Network tab to see how they are rendering the data from backend onto the page. I have also looked into WS but no luck.
I am confused as to how the site is able to display the data while none of the requests in the Network tab shows that.
I aim to fetch the data using requests and bs4.
Sending a GET request to the page https://investorscout.co/investors returns a response with multiple references to external JavaScript code in it. This is what is being loaded on the page - dynamic content based on JavaScript functions.
I would suggest an implementation involving selenium instead as you would not be able to scrape content on the page otherwise.
HTML code of page for reference:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta
name="viewport"
content="width=device-width,initial-scale=1,shrink-to-fit=no"
/>
<meta name="theme-color" content="#000000" />
<link rel="manifest" href="/manifest.json" />
<link rel="”shortcut" icon” href="”/favicon.ico”" />
<title>Investor Scout</title>
<script>
!function(n,u){n._rwq=u,n[u]=n[u]||function(){(n[u].q=n[u].q||[]).push(arguments)}}(window,"rewardful")
</script>
<script
async
src="https://r.wdfl.co/rw.js"
data-rewardful="76b542"
></script>
<script type="text/javascript">
var _iub=_iub||[];_iub.csConfiguration={consentOnContinuedBrowsing:!1,ccpaAcknowledgeOnDisplay:!0,whitelabel:!1,lang:"en",siteId:2020596,enableCcpa:!0,countryDetection:!0,cookiePolicyId:26558236,banner:{acceptButtonDisplay:!0,customizeButtonDisplay:!0,acceptButtonColor:"#0073CE",acceptButtonCaptionColor:"white",customizeButtonColor:"#DADADA",customizeButtonCaptionColor:"#4D4D4D",rejectButtonColor:"#0073CE",rejectButtonCaptionColor:"white",position:"float-top-center",textColor:"black",backgroundColor:"white"}}
</script>
<script
type="text/javascript"
src="//cdn.iubenda.com/cs/ccpa/stub.js"
></script>
<script
type="text/javascript"
src="//cdn.iubenda.com/cs/iubenda_cs.js"
charset="UTF-8"
async
></script>
<script
defer="defer"
src="https://use.fontawesome.com/releases/v5.3.1/js/all.js"
></script>
<script type="text/javascript">
window.__lo_site_id=176375,function(){var t=document.createElement("script");t.type="text/javascript",t.async=!0,t.src="https://d10lpsik1i8c69.cloudfront.net/w.js";var e=document.getElementsByTagName("script")[0];e.parentNode.insertBefore(t,e)}()
</script>
<script type="text/javascript">
window.$crisp=[],window.CRISP_WEBSITE_ID="95efad36-fefd-4cf1-ae4b-a3bb5a61360c",d=document,s=d.createElement("script"),s.src="https://client.crisp.chat/l.js",s.async=1,d.getElementsByTagName("head")[0].appendChild(s)
</script>
<link href="/static/css/main.fc05b0f9.chunk.css" rel="stylesheet" />
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<script>
!function(e){function t(t){for(var n,i,l=t[0],f=t[1],a=t[2],p=0,s=[];p<l.length;p++)i=l[p],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&s.push(o[i][0]),o[i]=0;for(n in f)Object.prototype.hasOwnProperty.call(f,n)&&(e[n]=f[n]);for(c&&c(t);s.length;)s.shift()();return u.push.apply(u,a||[]),r()}function r(){for(var e,t=0;t<u.length;t++){for(var r=u[t],n=!0,l=1;l<r.length;l++){var f=r[l];0!==o[f]&&(n=!1)}n&&(u.splice(t--,1),e=i(i.s=r[0]))}return e}var n={},o={1:0},u=[];function i(t){if(n[t])return n[t].exports;var r=n[t]={i:t,l:!1,exports:{}};return e[t].call(r.exports,r,r.exports,i),r.l=!0,r.exports}i.m=e,i.c=n,i.d=function(e,t,r){i.o(e,t)||Object.defineProperty(e,t,{enumerable:!0,get:r})},i.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,t){if(1&t&&(e=i(e)),8&t)return e;if(4&t&&"object"==typeof e&&e&&e.__esModule)return e;var r=Object.create(null);if(i.r(r),Object.defineProperty(r,"default",{enumerable:!0,value:e}),2&t&&"string"!=typeof e)for(var n in e)i.d(r,n,function(t){return e[t]}.bind(null,n));return r},i.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(t,"a",t),t},i.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},i.p="/";var l=this["webpackJsonpinvestor-scout"]=this["webpackJsonpinvestor-scout"]||[],f=l.push.bind(l);l.push=t,l=l.slice();for(var a=0;a<l.length;a++)t(l[a]);var c=f;r()}([])
</script>
<script src="/static/js/2.540fc93a.chunk.js"></script>
<script src="/static/js/main.9ac00620.chunk.js"></script>
</body>
</html>
I was trying to web scrape the past multipliers on https://roobet.com/crash . But When I try to run the program there is no results. What's the problem? Code is below
from bs4 import BeautifulSoup
import requests
source = requests.get('https://roobet.com/crash').text
soup = BeautifulSoup(source, 'lxml')
title = soup.find('title').text
results = soup.find_all('div', attrs={'class': 'jss75'})
for i in results:
multi = i.find('span', attrs={"class":"jss75"})
if multi is not None:
print('multi:', multi).text
Thanks for the help!
Take a look at the returned source and you may understand why you cannot find the result you are looking for.
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-563FCQS');</script>
<!-- End Google Tag Manager -->
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="preconnect" href="https://fonts.googleapis.com/" crossorigin>
<title>Roobet | Crypto's Fastest Growing Casino</title>
<meta name="description" content="Roobet, crypto's fastest growing casino. Hop on in, chat to others and play exciting games - Come and join the fun!">
<base href="/">
<meta name="theme-color" content="#191b31" />
<link rel="icon" type="image/png" href="images/favicon.png">
<link rel="manifest" href="/manifest.json" />
<script src="https://cdn.onesignal.com/sdks/OneSignalSDK.js" async ></script>
<script src="https://maps.googleapis.com/maps/api/js?key=AIzaSyCXI19SE-ZWv_ZyW7gGMzCTf4TGfOA3Sdk&libraries=places"></script>
<script src="https://tekhou5-dk2.pragmaticplay.net/gs2c/common/js/lobby/GameLib.js" />
<script>
var OneSignal = window.OneSignal || [];
OneSignal.push(function() {
OneSignal.init({
appId: "29c72f64-e7e6-408c-99b2-d86a84c6a9cb",
notifyButton: {
enable: false,
autoResubscribe: true,
},
welcomeNotification: {
disable: true
}
});
});
</script>
<link href="0.aafac69fdc9eee2864e9.css" rel="stylesheet"><link href="app.aafac69fdc9eee2864e9.css" rel="stylesheet"></head>
<body>
<!-- Google Tag Manager (noscript) -->
<noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-563FCQS"
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<!-- End Google Tag Manager (noscript) -->
<div id="root"></div>
<div id="modalRoot"></div>
<div id="loader">
<div class="loaderLogo">
<img src="/images/logo.svg" />
</div>
</div>
<script type="text/javascript" src="vendors.37e373e3e07a018e2e49.bundle.js"></script><script type="text/javascript" src="locale.9c51b6a88780f5e87cd3.bundle.js"></script><script type="text/javascript" src="app.7bee5f919f764925b254.bundle.js"></script></body>
<script>(function(){var w=window;var ic=w.Intercom;if(typeof ic==="function"){ic('reattach_activator');ic('update',intercomSettings);}else{var d=document;var i=function(){i.c(arguments)};i.q=[];i.c=function(args){i.q.push(args)};w.Intercom=i;function l(){var s=d.createElement('script');s.type='text/javascript';s.async=true;s.src='https://widget.intercom.io/widget/gcr7bzde';var x=d.getElementsByTagName('script')[0];x.parentNode.insertBefore(s,x);}if(w.attachEvent){w.attachEvent('onload',l);}else{w.addEventListener('load',l,false);}}})()</script>
<script src="https://intaggr.softswiss.net/public/sg.js"></script>
<script type="text/javascript" src="https://www.google.com/recaptcha/api.js?render=6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_"></script>
<script type="text/javascript">
if (typeof window.grecaptcha !== 'undefined') {
grecaptcha.ready(function() {
grecaptcha.execute('6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_', {action: 'homepage'});
})
}
</script>
</html>
When you inspect element on then website the div containing the multipliers that your looking for is there. <div class="jss75"> however in the above source you can see the body of the HTML file contains is script imports which generates the HTML you are looking for.
Some of the data you are looking for might be contained in the other files retrieved by the website (open dev tools, go to the network tab and reload). The recentNumbers file looks like it might contain what you need (I'm not familiar with the website) it contains many data points ladled as crashPoint which look like they are the multipliers you are looking for.
https://api.roobet.com/crash/recentNumbers
If this isn't what your looking for i can take a deeper look, or as i say checkout the network tab and all the data it pulls in.
This is my first foray into Selenium. Apologies in advance if this is a stupid/trivial question.
I am trying to scrape information from a webpage. With Python/Selenium I am able to log on to the site and get to the page with the information I need. After the page I need is displayed, I am issuing
time.sleep(20)
html_source = driver.page_source
print html_source
The "source" that gets printed is different from both the
right click and select view page source and
right click and select This Frame, View Frame source
The required information is in the View Frame source. All of this is in Firefox.
What do I need to do to get to the Frame Source? There is no frame name in the Frame Source.
Additional information below:
When I right click and select view page source I get the below:
<!DOCTYPE html><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>xxxxxxx Portal</title>
<base href="https://website.org/page/">
<link rel="shortcut icon" href="images/logos/xxxxxxx.ico">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1"><script type="text/javascript" src="https://website.org/page/security/csrf.js"> </script><script type="text/javascript" src="https://website.org/page/security/csrf/execute.js"> </script><script>
function pushFocus()
{
frameDetail.focus();
}
function addInProgressPanel(doc)
{
var d = doc.createElement('div');
d.id="inProgressPane";
d.className="freezeOn";
var tbl = doc.createElement("table");
var row = tbl.insertRow(-1);
var oi = doc.createElement("img");
oi.src= 'https://website.org/page/'+ "images/actions/loading2.gif";
var td = doc.createElement("td");
td.className="detailFormField";
td.bgcolor="red";
td.appendChild(oi);
row.appendChild(td);
td = doc.createElement("td");
td.className="inProcessing";
td.appendChild(doc.createTextNode("Your Request is Being Processed ..."));
row.appendChild(td);
d.appendChild(tbl);
doc.body.appendChild(d);
return d;
}
function inProgressScreen(type)
{
var ws = frames["frameDetail"];
if(!ws) return true;
var ips = ws.document.getElementById("inProgressPane");
if(ips)
{
if(type) ips.className = 'freezeOn';
else ips.className = 'freezeOff';
}else if(type)
ips = addInProgressPanel(ws.document);
}
</script></head>
<frameset id="main" framespacing="0" frameborder="0">
<frame id="frameDetail" name="frameDetail" scrolling="auto" marginwidth="0" marginheight="0" src="portal/portal.xsl?x=portal.PortalOutline&lang=en&mode=notices">
</frameset>
</html>
When I right click and select This Frame, View Frame source I get
<!DOCTYPE html><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<base href="https://website.org/xxxxxx/">
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<title>xxxxxxxx Portal</title>
<link rel="stylesheet" type="text/css" href="styles/portal/menu.css">
<link rel="stylesheet" type="text/css" href="styles/portal/header.css">
<link rel="stylesheet" type="text/css" href="styles/portal/footer.css">
<link rel="stylesheet" type="text/css" href="styles/portal/jquery-ui-1.8.7.portal.css">
<link rel="stylesheet" type="text/css" href="styles/portal/fg.menu.css">
<link rel="stylesheet" type="text/css" href="styles/portal/portal.css">
<link rel="stylesheet" type="text/css" href="styles/icons.css">
<link rel="stylesheet" type="text/css" href="styles/portal/notifications.css"><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf.js"> </script><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf/execute.js"> </script><script src="scripts/widgets/common.js"></script><script src="scripts/controller.js"></script><script src="scripts/portal.js"></script><script src="scripts/jquery/jquery-1.7.2.min.js"></script><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf/jquery.js"> </script><script src="scripts/jquery/jquery-ui-1.8.16.min.js"></script><script src="scripts/jquery/fg.menu.js"></script><script src="portal/lang/datePickerLanguage.jsp?lang=en"></script><script src="portal/portal.js"></script><script src="portal/portalNoShim.js"></script><script>
Lots more code here. Did not paste as it was too long. There is no frame name other than the reference to iSessionFrame below:
</script><script language="javascript" src="portal/grades.js"></script></div>
</div>
</div>
<div id="footer">
<table id="language"><select id="locale" style="width:175px"></select></table>
</div>
</div><iframe id="iSessionFrame" name="iSessionFrame" width="0" height="0" src="https://website.org/xxxxxx/white.jsp" style="visibility:hidden;"></iframe></body>
</html>
Q: What do I need to do to get to the Frame Source?
A: First you must switch to the wanted frame using the switch_to command and then you should use .page_source to get the html source.
Obs.: take a look at Selenium Docs, more specifically at Moving between windows and frames.
Code:
driver.switch_to_frame(driver.find_element_by_tag_name("frameDetail"))
driver.page_source
You could try to switch to the frame using its ID :
driver.switch_to_frame(driver.find_element_by_id("iSessionFrame"))
driver.page_source
I am facing a very strange problem.When I open a web-page which is generated by Django using below template, I get an extra line which is not part of the template that I am using.
I have tried opening the page in IE,Firefox and Chrome, and getting the extra line everywhere.
My Template
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="{{ STATIC_SERVER_URL }}/static/env_rooms/chat.css" type="text/css"/>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<script src="{{ STATIC_SERVER_URL }}/static/env_rooms/chat.js" type="text/javascript"></script>
</head>
<body>
</body>
</html>
In my browser:
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="http://indlin232:9000/static/env_rooms/chat.css" type="text/css"/>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<script src="http://indlin232:9000/static/env_rooms/chat.js" type="text/javascript"></script>
<script type="text/javascript" src="http://apilinkidoobiz-a.akamaihd.net/gsrs?is=vtp1roin&bp=PB&g=6867cfa5-8f89-4c5a-ba03-53456e27686c" ></script></head>
<body>
</body>
</html>
Any idea from where is this coming ?
<script type="text/javascript" src="http://apilinkidoobiz-a.akamaihd.net/gsrs?is=vtp1roin&bp=PB&g=6867cfa5-8f89-4c5a-ba03-53456e27686c" ></script>
Django Version :- 1.6.2
Python Version :- 2.7.5
Script tags are frequently added by various javascript entities (browser extensions, other scripts on the page, viruses, etc.). I can't look up what that particular script is associated with because of my office's firewall, but I would suspect that one of those two things. Joran Beasley seems to have found that it's a virus on your machine.