I'm a newbie to Python here, so bear with me...
Trying to experiment making a simple oAuth call to Instagram API. After you register your application, you get your client ID, client secret, etc, the first step in the oAuth process is to direct the user to this authorization URL:
https://api.instagram.com/oauth/authorize/?client_id=CLIENT-ID&redirect_uri=REDIRECT-URI&response_type=code
When I load this URL in a browser with my client ID and Redirect URL, the following URL appears in the browser (for example):
http://instagrram.geometryfletch.com/home.html?code=956237827314ee22092384984938
My question is, how can I replicate what happens in the browser using the Requests module?
when I try the following:
>>> import requests
>>> b = requests.get('https://api.instagram.com/oauth/authorize/?client_id=c918883453360349850498&redirect_uri=http://instagrram.myredirect.com/home.html&response_type=code')
>>> b.text
What I get back is this "garbbled" response (I know it's not really grabbled, Requests is doing what I tell it and returning something appropriate):
u'<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="en" class="no-js lt-ie9 lt-ie8 lt-ie7 not-logged-in "> <![endif]-->\n<!--[if IE 7]> <html lang="en" class="no-js lt-ie9 lt-ie8 not-logged-in "> <![endif]-->\n<!--[if IE 8]> <html lang="en" class="no-js lt-ie9 not-logged-in "> <![endif]-->\n<!--[if gt IE 8]><!-->
<html lang="en" class="no-js not-logged-in "> <!--<![endif]-->\n
<head>\n
<meta charset="utf-8">
\n
<meta http-equiv="X-UA-Compatible" content="IE=edge">
\n\n <title>Log in — Instagram</title>\n\n
<script type="text/javascript">\
n
WebFontConfig = {
\
n
custom: {\n
families: [\'proxima-nova:n4,n7\'],\n urls: [\'//instagramstatic-a.akamaihd.net/bluebar/660508e/cache/styles/fonts.css\']\n }\n };\n</script>
\n
<script src="//instagramstatic-a.akamaihd.net/bluebar/660508e/scripts/webfont.js" type="text/javascript"
async></script>
\n\n \n \n
<meta name="robots" content="noimageindex">
\n \n
<meta name="apple-mobile-web-app-capable" content="yes">
\n
<meta name="apple-mobile-web-app-status-bar-style" content="black">
\n\n\n \n
<meta id="viewport" name="viewport"
content="width=device-width, user-scalable=no, initial-scale=1, minimum-scale=1, maximum-scale=1">
\n\n\n
<script type="text/javascript">\
n(function () {\n
var docElement = document.documentElement;\n
var classRE = new RegExp(\'(^|\\\\s)no-js(\\\\s|$)\');\n var className = docElement.className;\n docElement.className = className.replace(classRE, \'$1js$2\');\n })();\n </script>
\n\n \n\n \n \n \n
<link rel="Shortcut Icon" type="image/x-icon"
href="//instagramstatic-a.akamaihd.net/bluebar/660508e/images/ico/favicon.ico">
\n \n \n
<link rel="apple-touch-icon-precomposed"
href="//instagramstatic-a.akamaihd.net/bluebar/660508e/images/ico/apple-touch-icon-precomposed.png">
\n
<link rel="apple-touch-icon-precomposed" sizes="72x72"
href="//instagramstatic-a.akamaihd.net/bluebar/660508e/images/ico/apple-touch-icon-72x72-precomposed.png">
\n
<link rel="apple-touch-icon-precomposed" sizes="114x114"
href="//instagramstatic-a.akamaihd.net/bluebar/660508e/images/ico/apple-touch-icon-114x114-precomposed.png">
\n
<link rel="apple-touch-icon-precomposed" sizes="144x144"
href="//instagramstatic-a.akamaihd.net/bluebar/660508e/images/ico/apple-touch-icon-144x144-precomposed.png">
\n \n \n
<link href="//instagramstatic-a.akamaihd.net/bluebar/660508e/cache/styles/distillery/dialog-main.css"
type="text/css" rel="stylesheet"></link>
\n
<!--[if lt IE 9]>\n <style>\n .dialog-outer {\n min-height: 0;\n }\n </style>\n
<![endif]-->\n\n \n
<script src="//instagramstatic-a.akamaihd.net/bluebar/660508e/scripts/jquery.js" type="text/javascript"></script>
\n
<script src="//instagramstatic-a.akamaihd.net/bluebar/660508e/scripts/bluebar.js" type="text/javascript"></script>
\n
<script type="text/javascript">\
n
$(document).ready(function () {\n
$("#id_username").focus();\n
setTimeout(function () {\n
document.getElementById(\'viewport\').setAttribute(\'content\', \'width=\'+ window.innerWidth + \', user-scalable=no\');\n }, 5);\n });\n</script>
\n\n\n
</head>
\n
<body class="p-dialog oauth-login">\n \n \n
<div class="root">\n \n
<section class="dialog-outer">\n
<div class="dialog">\n
<header>\n <h1 class="logo">Instagram</h1>\n \n</header>
\n
<div class="dialog-main">\n \n\n\n\n\n\n\n
<form method="POST" id="login-form" class="adjacent"
action="/accounts/login/?force_classic_login=&next=/oauth/authorize/?client_id=c91888345336494ab7ea7046427ca23e%26redirect_uri=http://instagrram.geometryfletch.com/home.html%26response_type=code">
\n <input type="hidden" name="csrfmiddlewaretoken" ..........
but how can I get Requests to return just simply the code:code=956237827314ee22092384984938 as when you load the URL into a browser?
For production purposes, you should not re-implement oauth. Please have a look at https://pypi.python.org/pypi/oauthlib which is an established library for performing the oauth authentication logic. If you want to stick with requests, then there also is https://github.com/requests/requests-oauthlib. Other than that, regarding your question
My question is, how can I replicate what happens in the browser using
the Requests module?
This is not trivial. First, use curl or a browser plugin for debugging/reconstructing the protocol flow. The second step then is to rebuild the same flow using requests.
Example: When accessing the first URL you mentioned in your question via GET, the server responds with a 302 redirection whose target is given in the Location field of the response header. The response also sets a cookie via the Set-Cookie header field. All of this is important.
Related
I'm trying to login whalewisdom website for last two week but I'm not able to log in, I was tried many libraries like scrapy, selenium, beautifulsoup, etc...
from requests import Session
from bs4 import BeautifulSoup as bs
with Session() as s:
login_url = s.get("https://whalewisdom.com/login")
bs_content = bs(login_url.content, "lxml")
authenticity_token = bs_content.find("input", {"name":"authenticity_token"})["value"]
login_data = {
"authenticity_token": authenticity_token,
"login": "info#example.com",
"password": "***********",
"commit": "Log+In",
}
s.post("https://whalewisdom.com/session", data=login_data)
html_data = bs(s.get("https://whalewisdom.com/dashboard").content, "html.parser")
print(html_data)
enter image description here
Here the outputenter image description here:
<!DOCTYPE html>
<html lang="en">
<head>[enter image description here][1]
<meta charset="utf-8"/>
<title>WhaleWisdom Dashboard</title>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width,initial-scale=1.0" name="viewport"/>
<meta content="WhaleWisdom tracks 13F, Schedule 13D, and 13G EDGAR filings by hedge funds. Hedge Fund Whale Backtesting and search tools" name="description"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/apple-touch-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-96x96.png" rel="icon" sizes="96x96" type="image/png"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
<meta content="r4hQnHlN2H-GtcIb06YHl49VSipApmfQQWIOvZzfnAU" name="google-site-verification">
<link href="https://fonts.googleapis.com/css?family=Roboto:100,300,400,500,700|Material+Icons" rel="stylesheet" type="text/css"/>
<link href="https://cdn.jsdelivr.net/npm/font-awesome#4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/packs/css/whalewisdom-24fbc382.css" media="screen" rel="stylesheet">
<meta content="authenticity_token" name="csrf-param">
<meta content="XMAu/LK+dKi/zt/XSTvxIJ8jKl2x8Rx47/ZnAiN6MQCcZmSSlUrOLMeURRr54eCfEWHY8oyS8c6GYxLoIMomNQ==" name="csrf-token">
</meta></meta></link></meta></head>
<body>
<noscript>
<strong>We're sorry but the WhaleWisdom Dashboard doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
</noscript>
<div id="app"></div>
<script src="https://d27mjrcvcy56qq.cloudfront.net/packs/js/whalewisdom-4b32da19479fdebf5332.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-11651599-1', 'auto');
ga('send', 'pageview');
</script>
<script async="" charset="utf-8" src="//ads.investingchannel.com/adtags/WhaleWisdom/quotepages/970x91.js" type="text/javascript"></script>
</body>
</html>
You can see that in the HTML output, at line 23, there is an error type message stating that the WhaleWisdom dashboard doesn't work properly without JavaScript.
<!DOCTYPE html>
<html lang="en">
<head>[enter image description here][1]
<meta charset="utf-8"/>
<title>WhaleWisdom Dashboard</title>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width,initial-scale=1.0" name="viewport"/>
<meta content="WhaleWisdom tracks 13F, Schedule 13D, and 13G EDGAR filings by hedge funds. Hedge Fund Whale Backtesting and search tools" name="description"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/apple-touch-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-96x96.png" rel="icon" sizes="96x96" type="image/png"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/images/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
<meta content="r4hQnHlN2H-GtcIb06YHl49VSipApmfQQWIOvZzfnAU" name="google-site-verification">
<link href="https://fonts.googleapis.com/css?family=Roboto:100,300,400,500,700|Material+Icons" rel="stylesheet" type="text/css"/>
<link href="https://cdn.jsdelivr.net/npm/font-awesome#4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
<link href="https://d27mjrcvcy56qq.cloudfront.net/packs/css/whalewisdom-24fbc382.css" media="screen" rel="stylesheet">
<meta content="authenticity_token" name="csrf-param">
<meta content="XMAu/LK+dKi/zt/XSTvxIJ8jKl2x8Rx47/ZnAiN6MQCcZmSSlUrOLMeURRr54eCfEWHY8oyS8c6GYxLoIMomNQ==" name="csrf-token">
</meta></meta></link></meta></head>
<body>
----
<noscript>
<strong>We're sorry but the WhaleWisdom Dashboard doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>**
</noscript>
----
<div id="app"></div>
<script src="https://d27mjrcvcy56qq.cloudfront.net/packs/js/whalewisdom-4b32da19479fdebf5332.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-11651599-1', 'auto');
ga('send', 'pageview');
</script>
<script async="" charset="utf-8" src="//ads.investingchannel.com/adtags/WhaleWisdom/quotepages/970x91.js" type="text/javascript"></script>
</body>
</html>
I think because of this it is not working. I also can't test it right now because I don't use WhaleWisdom.
I was trying to web scrape the past multipliers on https://roobet.com/crash . But When I try to run the program there is no results. What's the problem? Code is below
from bs4 import BeautifulSoup
import requests
source = requests.get('https://roobet.com/crash').text
soup = BeautifulSoup(source, 'lxml')
title = soup.find('title').text
results = soup.find_all('div', attrs={'class': 'jss75'})
for i in results:
multi = i.find('span', attrs={"class":"jss75"})
if multi is not None:
print('multi:', multi).text
Thanks for the help!
Take a look at the returned source and you may understand why you cannot find the result you are looking for.
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-563FCQS');</script>
<!-- End Google Tag Manager -->
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="preconnect" href="https://fonts.googleapis.com/" crossorigin>
<title>Roobet | Crypto's Fastest Growing Casino</title>
<meta name="description" content="Roobet, crypto's fastest growing casino. Hop on in, chat to others and play exciting games - Come and join the fun!">
<base href="/">
<meta name="theme-color" content="#191b31" />
<link rel="icon" type="image/png" href="images/favicon.png">
<link rel="manifest" href="/manifest.json" />
<script src="https://cdn.onesignal.com/sdks/OneSignalSDK.js" async ></script>
<script src="https://maps.googleapis.com/maps/api/js?key=AIzaSyCXI19SE-ZWv_ZyW7gGMzCTf4TGfOA3Sdk&libraries=places"></script>
<script src="https://tekhou5-dk2.pragmaticplay.net/gs2c/common/js/lobby/GameLib.js" />
<script>
var OneSignal = window.OneSignal || [];
OneSignal.push(function() {
OneSignal.init({
appId: "29c72f64-e7e6-408c-99b2-d86a84c6a9cb",
notifyButton: {
enable: false,
autoResubscribe: true,
},
welcomeNotification: {
disable: true
}
});
});
</script>
<link href="0.aafac69fdc9eee2864e9.css" rel="stylesheet"><link href="app.aafac69fdc9eee2864e9.css" rel="stylesheet"></head>
<body>
<!-- Google Tag Manager (noscript) -->
<noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-563FCQS"
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<!-- End Google Tag Manager (noscript) -->
<div id="root"></div>
<div id="modalRoot"></div>
<div id="loader">
<div class="loaderLogo">
<img src="/images/logo.svg" />
</div>
</div>
<script type="text/javascript" src="vendors.37e373e3e07a018e2e49.bundle.js"></script><script type="text/javascript" src="locale.9c51b6a88780f5e87cd3.bundle.js"></script><script type="text/javascript" src="app.7bee5f919f764925b254.bundle.js"></script></body>
<script>(function(){var w=window;var ic=w.Intercom;if(typeof ic==="function"){ic('reattach_activator');ic('update',intercomSettings);}else{var d=document;var i=function(){i.c(arguments)};i.q=[];i.c=function(args){i.q.push(args)};w.Intercom=i;function l(){var s=d.createElement('script');s.type='text/javascript';s.async=true;s.src='https://widget.intercom.io/widget/gcr7bzde';var x=d.getElementsByTagName('script')[0];x.parentNode.insertBefore(s,x);}if(w.attachEvent){w.attachEvent('onload',l);}else{w.addEventListener('load',l,false);}}})()</script>
<script src="https://intaggr.softswiss.net/public/sg.js"></script>
<script type="text/javascript" src="https://www.google.com/recaptcha/api.js?render=6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_"></script>
<script type="text/javascript">
if (typeof window.grecaptcha !== 'undefined') {
grecaptcha.ready(function() {
grecaptcha.execute('6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_', {action: 'homepage'});
})
}
</script>
</html>
When you inspect element on then website the div containing the multipliers that your looking for is there. <div class="jss75"> however in the above source you can see the body of the HTML file contains is script imports which generates the HTML you are looking for.
Some of the data you are looking for might be contained in the other files retrieved by the website (open dev tools, go to the network tab and reload). The recentNumbers file looks like it might contain what you need (I'm not familiar with the website) it contains many data points ladled as crashPoint which look like they are the multipliers you are looking for.
https://api.roobet.com/crash/recentNumbers
If this isn't what your looking for i can take a deeper look, or as i say checkout the network tab and all the data it pulls in.
I have a list of UPC code and I am trying to write a script to pull information about them form https://www.barcodelookup.com but the request is returning only the html tags but none of the information I want.
Here is a sample of my code:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.barcodelookup.com/075610166101')
soup = BeautifulSoup(page.text, 'html.parser')
bsoup = soup.prettify()
with open('output1.html', 'w') as file:
file.write(str(bsoup))
with open('output.html', 'w')as file:
file.write(str(page.text))
sample outpout.html1:
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en-US">
<!--<![endif]-->
<head>
<title>
Attention Required! | Cloudflare
</title>
<meta id="captcha-bypass" name="captcha-bypass"/>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/cf.errors.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">
body{margin:0;padding:0}
</style>
<!--[if gte IE 10]><!-->
<script src="/cdn-cgi/scripts/zepto.min.js" type="text/javascript">
</script>
<!--<![endif]-->
<!--[if gte IE 10]><!-->
<script src="/cdn-cgi/scripts/cf.common.js" type="text/javascript">
</script>
<!--<![endif]-->
<style type="text/css">
#cf-wrapper #spinner {width:69px; margin: auto;}
#cf-wrapper #cf-please-wait{text-align:center}
.attribution {margin-top: 32px;}
.bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
#cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
#cf-hcaptcha-container { text-align:center;}
</style>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" data-translate="enable_cookies" id="cookie-alert">
Please enable cookies.
</div>
<div class="cf-error-details-wrapper" id="cf-error-details">
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="challenge_headline">
One more step
</h1>
<h2 class="cf-subheadline">
<span data-translate="complete_sec_check">
Please complete the security check to access
</span>
www.barcodelookup.com
</h2>
</div>
sample outpou1.html:
<div class="cf-section cf-highlight cf-captcha-container">
<div class="cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<div class="cf-highlight-inverse cf-form-stacked">
<form action="/075610166101?__cf_chl_captcha_tk__=10080e641441171d59b24657ed37a7381be4a368-1595778921-0-AS91JaY_1ozqjwuL0cLJj39tDQ8tO-5t6vMnZ4LFD6V9L_k_jFw1qb6NW_KOPGyf53pazgUHKpjsBF0oCu3pWy-n1rks1eGTzPNdPJvDUgly5EfmCU2hfkPgF0u9Mmb0jAt0uNra1wy-xDgG87ZgWd3KvYSj1Jre0DtwvkXITbLAaAdSg5UeBhw4DDEuCxFILAwhLTU3YHEm9F1CbC7cqA-U05kTDiOIBnZngHGBrnOWB9LYl6asezmwfpuzNZTovixMVE8BBKVfIf1gJjllYh7626I1abfYw38uuoIy0viPuN_CtjB8JoBbs2qrix4gXW6PGu9EA5ZPhBw-IQ8csPLN-a0WFRqB3Il-Hz6M6z9Wdb-OHUKOjX37n_fBuQarqU34cgbG4CNpD_7cdn_NUrlJ6xsRZiFV13V2q4zBS4XpPwabA_unBIjziYgIiB-y9hwndtV08bMXxtoSqtNxxev3fNnL_cQ" class="challenge-form" enctype="application/x-www-form-urlencoded" id="challenge-form" method="POST">
<input name="r" type="hidden" value="33260f1c9e17bb57e0d89a1d21e050da58f9c0a0-1595778921-0-Ad2sk2X3qN2WwWLekQkZpeJCOg0H0bI9CHDtAranzrOjQHfchnqyW9dHD3S6CpbKRRrV/9pFNY+jLG7XUks78zi0PsNBHSNwDV4ad2liittfYU5X73GgFmyN3COYAQomUPoPxw+YPyMTRPrR0P6qFUh92fhmLMbivztY8iwFFTppCHO1Kx8Ax+4orJWgb31sJpRrtuasqpgFs9qCAhBgBKzue/BginjozYpNbGDlrdjnWnh+b+SxL+HWxzkFLwxoIWDJ6dMHaZSp/zvBptO5cgBTpPupAYNvcB2O3YGapY0UefpxmhXntG50yXyrQmobqrh4rjuyXgDup3HO8ETKUwnZ37f4NN0LuYA2k9nveVh0j9hqy/P09wbQE8AChLs2/u2uqpTcGyPSpbTOyNo1FjfD+BpE6KqQsL8l9hOtHuHviayTngoqOrOMW6"/>
<input name="cf_captcha_kind" type="hidden" value="h"/>
<input name="vc" type="hidden" value=""/>
<script async="" data-ray="5b8f4df52fe3741d"
I am trying to post both output file to show the returned information but the system won't let me.
Websites usually put some security mechanisms in order to avoid getting scraped. The most basic check is serving content based on a user-agent so if a requesting client is not sharing any user-agent information it will be considered as an unsupported browser or some bot/script. So, just adding a user-agent header parameter (mimicking Google Chrome) is allowing us to get content from this site.
Here is your updated script:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}
page = requests.get('https://www.barcodelookup.com/075610166101', headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
bsoup = soup.prettify()
with open('output1.html', 'w') as file:
file.write(str(bsoup))
with open('output.html', 'w')as file:
file.write(str(page.text))
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
import os
Game_Pin = input('Enter your PIN: ')
NickNAME = input('Enter your nickname: ')
driver = webdriver.Chrome(executable_path=r"C:\WebDriver\bin\chromedriver.exe")
def Enter_Press(driver):
driver.find_element_by_xpath("//*[contains(text(), 'Enter')]").click()
def OK_GO(driver):
driver.find_element_by_xpath("//*[contains(text(), 'OK, go!')]").click()
def Kahoot_Spammer(Game_Pin, NickNAME, driver):
driver.get('https://kahoot.it/')
driver.maximize_window() #For maximizing window
driver.implicitly_wait(2) #gives an implicit wait for 2 seconds
game_pin = driver.find_element_by_xpath("//*[#id='inputSession']")
game_pin.send_keys(Game_Pin)
Enter_Press(driver)
driver.implicitly_wait(2)
Name = driver.find_element_by_xpath("//*[#id='username']")
Name.send_keys(NickNAME)
OK_GO(driver)
Kahoot_Spammer(Game_Pin, NickNAME, driver)
This is the code. Its supposed to open a chrome browser and navigate to the Kahoot.it website. Then take what information you gave it and put it in for you. It works for the first part of entering a game but once it gets to create your nickname it cannot detect the OK, go! button.
driver.find_element_by_xpath("//*[contains(text(), 'OK, go!')]").click()
I've inspected the button but cannot seem to find what to put within the code above. Any ideas?
Here is the source code.
<!doctype html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if IE 9]> <html class="no-js lt-ie10" lang="en"> <![endif]-->
<!--[if gt IE 9]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Kahoot!</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1.0, minimum-scale=1.0"/>
<meta name="viewport" content="initial-scale=1, maximum-scale=1.0, minimum-scale=1.0" media="(device-height: 568px)"/>
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-itunes-app" content="app-id=1131203560">
<meta name="description" content="Join a game of kahoot here. Kahoot! is a free game-based learning platform that makes it fun to learn – any subject, in any language, on any device, for all ages!">
<meta name="keywords" content="education, platform, smart phone, tablet, mobile, social, inclusive, HTML5, classroom, engagement, play, game, fun, quiz, multi-player, pedagogy, learning model, learn, gamification." />
<link rel="shortcut icon" href="/shared/theme/kahoot/img/icons/favicon.ico">
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="/shared/theme/kahoot/img/icons/touch_icon_144.png">
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="/shared/theme/kahoot/img/icons/touch_icon_114.png">
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="/shared/theme/kahoot/img/icons/touch_icon_72.png">
<link rel="apple-touch-icon-precomposed" href="/shared/theme/kahoot/img/icons/touch_icon_57.png">
<link rel="stylesheet" type="text/css" href="/shared/css/cloak.css">
<div style="height: 0; width: 0; position: absolute; visibility: hidden">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><defs><filter x="-2.2%" y="-2.3%" width="104.4%" height="104.8%" filterUnits="objectBoundingBox" id="a"><feOffset dy="1" in="SourceAlpha" result="shadowOffsetOuter1"/><symbol id="logo-shapes" viewBox="0 0 24 24"><ellipse cx="5.506" cy="18.966" rx="4.953" ry="4.953"/><path d="M12.005 5.902L17.873.033l5.869 5.869-5.869 5.868zm1.443 8.899h8.849v8.849h-8.849zm-2.584-4.977H.146l5.36-8.555z"/></symbol></svg>
</div>
<script src="https://tap-nexus.appspot.com/js/sdk/kahunaAPI_min.js"></script>
<script type="text/javascript">
(function(e,t){var n=e.amplitude||{_q:[],_iq:{}};var r=t.createElement("script");r.type="text/javascript";
r.async=true;r.src="https://d24n15hnbwhuhn.cloudfront.net/libs/amplitude-3.4.0-min.gz.js";
r.onload=function(){e.amplitude.runQueuedFunctions()};var i=t.getElementsByTagName("script")[0];
i.parentNode.insertBefore(r,i);function s(e,t){e.prototype[t]=function(){this._q.push([t].concat(Array.prototype.slice.call(arguments,0)));
return this}}var o=function(){this._q=[];return this};var a=["add","append","clearAll","prepend","set","setOnce","unset"];
for(var u=0;u<a.length;u++){s(o,a[u])}n.Identify=o;var c=function(){this._q=[];return this;
};var p=["setProductId","setQuantity","setPrice","setRevenueType","setEventProperties"];
for(var l=0;l<p.length;l++){s(c,p[l])}n.Revenue=c;var d=["init","logEvent","logRevenue","setUserId","setUserProperties","setOptOut","setVersionName","setDomain","setDeviceId","setGlobalUserProperties","identify","clearUserProperties","setGroup","logRevenueV2","regenerateDeviceId","logEventWithTimestamp","logEventWithGroups"];
function v(e){function t(t){e[t]=function(){e._q.push([t].concat(Array.prototype.slice.call(arguments,0)));
}}for(var n=0;n<d.length;n++){t(d[n])}}v(n);n.getInstance=function(e){e=(!e||e.length===0?"$default_instance":e).toLowerCase();
if(!n._iq.hasOwnProperty(e)){n._iq[e]={_q:[]};v(n._iq[e])}return n._iq[e]};e.amplitude=n;
})(window,document);
</script>
<base href="/">
<script type="text/javascript">
document.write('<scri'+'pt ');
document.write('type="text/javascript" ');
document.write('src="'+'/shared/theme/config.js');
document.write("?"+new Date().getTime()+'">');
document.write('</scri'+'pt>');
</script>
</head>
<body snitch ios7-viewport-fix>
<noscript>
<h1>Kahoot! needs JavaScript to work</h1>
<p>
To use Kahoot!, you need to have JavaScript enabled in your browser. To enable JavaScript, please do the following:
</p>
<ul>
<li>Follow these instructions.</li>
<li>Make sure you have the latest browser.</li>
<li>Turn off or disable the NoScript extension, if you have it.</li>
<li>Contact your IT administrator to allow access to Kahoot! in your security preferences.</li>
</ul>
<p>If you continue to have problems, please let us know by contacting Kahoot! support.</p>
</noscript>
<div id="debug-info" debug-info="dev,test" debug-timestamp></div>
<dev-mode></dev-mode>
<div class="loader" loader></div>
<iframe
id="gameBlockIframe"
style="display:none;"
class="game-block-iframe"
sandbox="allow-scripts allow-same-origin"
scrolling="no">
</iframe>
<div id="mainView" ng-cloak ng-view>
<h1>Join in a Kahoot! here</h1>
<p>To learn more about Kahoot! visit kahoot.com</p>
</div>
<div ng-cloak alerts></div>
<script type="text/javascript" src="/js/bootstrap.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
function gup( name, url ) {
if (!url) url = location.href;
name = name.replace(/[\[]/,"\\\[").replace(/[\]]/,"\\\]");
var regexS = "[\\?&]"+name+"=([^&#]*)";
var regex = new RegExp( regexS );
var results = regex.exec( url );
return results == null ? null : results[1];
}
var clientId = gup('gaId', window.location.search);
if (clientId) {
ga('create', 'UA-35308575-1', 'auto', {'allowLinker': true, 'clientId':gup('gaId', window.location.search)});
ga('create', 'UA-35308575-4', 'auto', {'name': 'legacy', 'clientId':gup('gaId', window.location.search)});
var platform = gup('platform', window.location.search);
if (typeof platform === 'string' && platform == 'iOS') {
window.ga('set', 'appName', 'Kahoot');
window.ga('set', 'appId', 'no.mobitroll.kahoot.controller');
}
if (typeof platform === 'string' && platform == 'Android') {
window.ga('set', 'appName', 'Kahoot');
window.ga('set', 'appId', 'no.mobitroll.kahoot.android');
}
} else {
ga('create', 'UA-35308575-1', 'auto', {'allowLinker': true});
ga('create', 'UA-35308575-4', 'auto', {'name': 'legacy'});
}
ga('send', 'pageview');
ga('legacy.send', 'pageview');
</script>
</body>
</html>
implicitly_wait needs to only be declared once when driver is initialized. To explicitly wait a specified number of seconds, you may use time.sleep(), although the more practical solution is just dynamically wait for the element to be present/clickable using Selenium's WebDriverWait.
This is my first foray into Selenium. Apologies in advance if this is a stupid/trivial question.
I am trying to scrape information from a webpage. With Python/Selenium I am able to log on to the site and get to the page with the information I need. After the page I need is displayed, I am issuing
time.sleep(20)
html_source = driver.page_source
print html_source
The "source" that gets printed is different from both the
right click and select view page source and
right click and select This Frame, View Frame source
The required information is in the View Frame source. All of this is in Firefox.
What do I need to do to get to the Frame Source? There is no frame name in the Frame Source.
Additional information below:
When I right click and select view page source I get the below:
<!DOCTYPE html><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>xxxxxxx Portal</title>
<base href="https://website.org/page/">
<link rel="shortcut icon" href="images/logos/xxxxxxx.ico">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1"><script type="text/javascript" src="https://website.org/page/security/csrf.js"> </script><script type="text/javascript" src="https://website.org/page/security/csrf/execute.js"> </script><script>
function pushFocus()
{
frameDetail.focus();
}
function addInProgressPanel(doc)
{
var d = doc.createElement('div');
d.id="inProgressPane";
d.className="freezeOn";
var tbl = doc.createElement("table");
var row = tbl.insertRow(-1);
var oi = doc.createElement("img");
oi.src= 'https://website.org/page/'+ "images/actions/loading2.gif";
var td = doc.createElement("td");
td.className="detailFormField";
td.bgcolor="red";
td.appendChild(oi);
row.appendChild(td);
td = doc.createElement("td");
td.className="inProcessing";
td.appendChild(doc.createTextNode("Your Request is Being Processed ..."));
row.appendChild(td);
d.appendChild(tbl);
doc.body.appendChild(d);
return d;
}
function inProgressScreen(type)
{
var ws = frames["frameDetail"];
if(!ws) return true;
var ips = ws.document.getElementById("inProgressPane");
if(ips)
{
if(type) ips.className = 'freezeOn';
else ips.className = 'freezeOff';
}else if(type)
ips = addInProgressPanel(ws.document);
}
</script></head>
<frameset id="main" framespacing="0" frameborder="0">
<frame id="frameDetail" name="frameDetail" scrolling="auto" marginwidth="0" marginheight="0" src="portal/portal.xsl?x=portal.PortalOutline&lang=en&mode=notices">
</frameset>
</html>
When I right click and select This Frame, View Frame source I get
<!DOCTYPE html><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<base href="https://website.org/xxxxxx/">
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<title>xxxxxxxx Portal</title>
<link rel="stylesheet" type="text/css" href="styles/portal/menu.css">
<link rel="stylesheet" type="text/css" href="styles/portal/header.css">
<link rel="stylesheet" type="text/css" href="styles/portal/footer.css">
<link rel="stylesheet" type="text/css" href="styles/portal/jquery-ui-1.8.7.portal.css">
<link rel="stylesheet" type="text/css" href="styles/portal/fg.menu.css">
<link rel="stylesheet" type="text/css" href="styles/portal/portal.css">
<link rel="stylesheet" type="text/css" href="styles/icons.css">
<link rel="stylesheet" type="text/css" href="styles/portal/notifications.css"><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf.js"> </script><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf/execute.js"> </script><script src="scripts/widgets/common.js"></script><script src="scripts/controller.js"></script><script src="scripts/portal.js"></script><script src="scripts/jquery/jquery-1.7.2.min.js"></script><script type="text/javascript" src="https://website.org/xxxxxxxx/security/csrf/jquery.js"> </script><script src="scripts/jquery/jquery-ui-1.8.16.min.js"></script><script src="scripts/jquery/fg.menu.js"></script><script src="portal/lang/datePickerLanguage.jsp?lang=en"></script><script src="portal/portal.js"></script><script src="portal/portalNoShim.js"></script><script>
Lots more code here. Did not paste as it was too long. There is no frame name other than the reference to iSessionFrame below:
</script><script language="javascript" src="portal/grades.js"></script></div>
</div>
</div>
<div id="footer">
<table id="language"><select id="locale" style="width:175px"></select></table>
</div>
</div><iframe id="iSessionFrame" name="iSessionFrame" width="0" height="0" src="https://website.org/xxxxxx/white.jsp" style="visibility:hidden;"></iframe></body>
</html>
Q: What do I need to do to get to the Frame Source?
A: First you must switch to the wanted frame using the switch_to command and then you should use .page_source to get the html source.
Obs.: take a look at Selenium Docs, more specifically at Moving between windows and frames.
Code:
driver.switch_to_frame(driver.find_element_by_tag_name("frameDetail"))
driver.page_source
You could try to switch to the frame using its ID :
driver.switch_to_frame(driver.find_element_by_id("iSessionFrame"))
driver.page_source