URL Readable by urllib in Python 2 but not in Python 3 - python
I can read a specific web page in Python2 quite easily:
>>> import urllib
>>> urllib.urlopen("http://www.pluralsight.com/authors")
<addinfourl at 4566566312 whose fp = <socket._fileobject object at 0x10fd18a50>>
When I try to read the same URL using Python3, however, I get an exception:
>>> from urllib.request import urlopen
>>> urlopen("http://www.pluralsight.com/authors")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>>
In Python 3, urllib.request.urlopen is equivalent to Python2's urllib2.urlopen, and urllib.urlopen has been removed.
You can see the differences and why you're getting an error in Python 3 in this SO question. Basically, urllib2.urlopen (urllib.request.urlopen in Python 3) handles the error for you, raising an exception, while urllib.urlopen just gives you the error as plain HTML.
Hope it helps.
It seems like it wasn't working in Python2 either:
u = urllib.urlopen("http://www.pluralsight.com/authors")
u.read()
#'<html><body><h1>403 Forbidden</h1>\nRequest
#forbidden by administrative rules.\n</body></html>\n\n'
You need to add a user-agent:
import urllib
req = urllib.request.Request(
"http://www.pluralsight.com/authors",
headers={
'User-Agent': 'Mozilla/5.0'
}
)
print(urllib.request.urlopen(req).read())
b'<!DOCTYPE html>\r\n<!--[if IE 8]>\r\n <html class="no-js lt-ie9" lang="en" ng-app="pluralsightModule">\r\n<![endif]-->\r\n<!--[if gt IE 8]><!-->\r\n<html class="no-js" lang="en" ng-app="pluralsightModule" id="ng-app">\r\n<!--<![endif]-->\r\n<head>\r\n <meta charset="utf-8" http-equiv="Content-type" content="text/html;" /><script type="text/javascript">window.NREUM||(NREUM={});NREUM.info = {"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"2700af8a3c","applicationID":"3058581","transactionName":"Z1ZRN0EDCEMDABVYWl4cdwxHLANEIQwPRUdfX18GQU0nRRYLDkNGH3pdB1Ya","queueTime":0,"applicationTime":3,"ttGuid":"88B4BF5354B4582F","agent":"js-agent.newrelic.com/nr-593.min.js"}</script><script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VwUGVl5VGwAAUVlXDwA="};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o?o:e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({QJf3ax:[function(t,e){function n(t){function e(e,n,a){t&&t(e,n,a),a||(a={});for(var c=s(e),f=c.length,u=i(a,o,r),d=0;f>d;d++)c[d].apply(u,n);return u}function a(t,e){f[t]=s(t).concat(e)}function s(t){return f[t]||[]}function c(){return n(e)}var f={};return{on:a,emit:e,create:c,listeners:s,_events:f}}function r(){return{}}var o="nr#context",i=t("gos");e.exports=n()},{gos:"7eSDFh"}],ee:[function(t,e){e.exports=t("QJf3ax")},{}],3:[function(t){function e(t){try{i.console&&console.log(t)}catch(e){}}var n,r=t("ee"),o=t(1),i={};try{n=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(i.console=!0,-1!==n.indexOf("dev")&&(i.dev=!0),-1!==n.indexOf("nr_dev")&&(i.nrDev=!0))}catch(a){}i.nrDev&&r.on("internal-error",function(t){e(t.stack)}),i.dev&&r.on("fn-err",function(t,n,r){e(r.stack)}),i.dev&&(e("NR AGENT IN DEVELOPMENT MODE"),e("flags: "+o(i,function(t){return t}).join(", ")))},{1:23,ee:"QJf3ax"}],4:[function(t){function e(t,e,n,i,s){try{c?c-=1:r("err",[s||new UncaughtException(t,e,n)])}catch(f){try{r("ierr",[f,(new Date).getTime(),!0])}catch(u){}}return"function"==typeof a?a.apply(this,o(arguments)):!1}function UncaughtException(t,e,n){this.message=t||"Uncaught error with no additional information",this.sourceURL=e,this.line=n}function n(t){r("err",[t,(new Date).getTime()])}var r=t("handle"),o=t(6),i=t("ee"),a=window.onerror,s=!1,c=0;t("loader").features.err=!0,t(4),window.onerror=e;try{throw new Error}catch(f){"stack"in f&&(t(1),t(5),"addEventListener"in window&&t(2),window.XMLHttpRequest&&XMLHttpRequest.prototype&&XMLHttpRequest.prototype.addEventListener&&t(3),s=!0)}i.on("fn-start",function(){s&&(c+=1)}),i.on("fn-err",function(t,e,r){s&&(this.thrown=!0,n(r))}),i.on("fn-end",function(){s&&!this.thrown&&c>0&&(c-=1)}),i.on("internal-error",function(t){r("ierr",[t,(new Date).getTime(),!0])})},{1:10,2:7,3:11,4:3,5:9,6:24,ee:"QJf3ax",handle:"D5DuLP",loader:"G9z0Bl"}],5:[function(t){t("loader").features.ins=!0},{loader:"G9z0Bl"}],6:[function(t){function e(){}if(window.performance&&window.performance.timing&&window.performance.getEntriesByType){var n=t("ee"),r=t("handle"),o=t(1);t("loader").features.stn=!0,t(2),n.on("fn-start",function(t){var e=t[0];e instanceof Event&&(this.bstStart=Date.now())}),n.on("fn-end",function(t,e){var n=t[0];n instanceof Event&&r("bst",[n,e,this.bstStart,Date.now()])}),o.on("fn-start",function(t,e,n){this.bstStart=Date.now(),this.bstType=n}),o.on("fn-end",function(t,e){r("bstTimer",[e,this.bstStart,Date.now(),this.bstType])}),n.on("pushState-start",function(){this.time=Date.now(),this.startPath=location.pathname+location.hash}),n.on("pushState-end",function(){r("bstHist",[location.pathname+location.hash,this.startPath,this.time])}),"addEventListener"in window.performance&&(window.performance.addEventListener("webkitresourcetimingbufferfull",function(){r("bstResource",[window.performance.getEntriesByType("resource")]),window.performance.webkitClearResourceTimings()},!1),window.performance.addEventListener("resourcetimingbufferfull",function(){r("bstResource",[window.performance.getEntriesByType("resource")]),window.performance.clearResourceTimings()},!1)),document.addEventListener("scroll",e,!1),document.addEventListener("keypress",e,!1),document.addEventListener("click",e,!1)}},{1:10,2:8,ee:"QJf3ax",handle:"D5DuLP",loader:"G9z0Bl"}],7:[function(t,e){function n(t){i.inPlace(t,["addEventListener","removeEventListener"],"-",r)}function r(t){return t[1]}var o=(t(1),t("ee").create()),i=t(2)(o),a=t("gos");if(e.exports=o,n(window),"getPrototypeOf"in Object){for(var s=document;s&&!s.hasOwnProperty("addEventListener");)s=Object.getPrototypeOf(s);s&&n(s);for(var c=XMLHttpRequest.prototype;c&&!c.hasOwnProperty("addEventListener");)c=Object.getPrototypeOf(c);c&&n(c)}else XMLHttpRequest.prototype.hasOwnProperty("addEventListener")&&n(XMLHttpRequest.prototype);o.on("addEventListener-start",function(t){if(t[1]){var e=t[1];"function"==typeof e?this.wrapped=t[1]=a(e,"nr#wrapped",function(){return i(e,"fn-",null,e.name||"anonymous")}):"function"==typeof e.handleEvent&&i.inPlace(e,["handleEvent"],"fn-")}}),o.on("removeEventListener-start",function(t){var e=this.wrapped;e&&(t[1]=e)})},{1:24,2:25,ee:"QJf3ax",gos:"7eSDFh"}],8:[function(t,e){var n=(t(2),t("ee").create()),r=t(1)(n);e.exports=n,r.inPlace(window.history,["pushState"],"-")},{1:25,2:24,ee:"QJf3ax"}],9:[function(t,e){var n=(t(2),t("ee").create()),r=t(1)(n);e.exports=n,r.inPlace(window,["requestAnimationFrame","mozRequestAnimationFrame","webkitRequestAnimationFrame","msRequestAnimationFrame"],"raf-"),n.on("raf-start",function(t){t[0]=r(t[0],"fn-")})},{1:25,2:24,ee:"QJf3ax"}],10:[function(t,e){function n(t,e,n){var r=t[0];"string"==typeof r&&(r=new Function(r)),t[0]=o(r,"fn-",null,n)}var r=(t(2),t("ee").create()),o=t(1)(r);e.exports=r,o.inPlace(window,["setTimeout","setInterval","setImmediate"],"setTimer-"),r.on("setTimer-start",n)},{1:25,2:24,ee:"QJf3ax"}],11:[function(t,e){function n(){f.inPlace(this,p,"fn-")}function r(t,e){f.inPlace(e,["onreadystatechange"],"fn-")}function o(t,e){return e}function i(t,e){for(var n in t)e[n]=t[n];return e}var a=t("ee").create(),s=t(1),c=t(2),f=c(a),u=c(s),d=window.XMLHttpRequest,p=["onload","onerror","onabort","onloadstart","onloadend","onprogress","ontimeout"];e.exports=a,window.XMLHttpRequest=function(t){var e=new d(t);try{a.emit("new-xhr",[],e),u.inPlace(e,["addEventListener","removeEventListener"],"-",function(t,e){return e}),e.addEventListener("readystatechange",n,!1)}catch(r){try{a.emit("internal-error",[r])}catch(o){}}return e},i(d,XMLHttpRequest),XMLHttpRequest.prototype=d.prototype,f.inPlace(XMLHttpRequest.prototype,["open","send"],"-xhr-",o),a.on("send-xhr-start",r),a.on("open-xhr-start",r)},{1:7,2:25,ee:"QJf3ax"}],12:[function(t){function e(t){if("string"==typeof t&&t.length)return t.length;if("object"!=typeof t)return void 0;if("undefined"!=typeof ArrayBuffer&&t instanceof ArrayBuffer&&t.byteLength)return t.byteLength;if("undefined"!=typeof Blob&&t instanceof Blob&&t.size)return t.size;if("undefined"!=typeof FormData&&t instanceof FormData)return void 0;try{return JSON.stringify(t).length}catch(e){return void 0}}function n(t){var n=this.params,r=this.metrics;if(!this.ended){this.ended=!0;for(var i=0;c>i;i++)t.removeEventListener(s[i],this.listener,!1);if(!n.aborted){if(r.duration=(new Date).getTime()-this.startTime,4===t.readyState){n.status=t.status;var a=t.responseType,f="arraybuffer"===a||"blob"===a||"json"===a?t.response:t.responseText,u=e(f);if(u&&(r.rxSize=u),this.sameOrigin){var d=t.getResponseHeader("X-NewRelic-App-Data");d&&(n.cat=d.split(", ").pop())}}else n.status=0;r.cbTime=this.cbTime,o("xhr",[n,r,this.startTime])}}}function r(t,e){var n=i(e),r=t.params;r.host=n.hostname+":"+n.port,r.pathname=n.pathname,t.sameOrigin=n.sameOrigin}if(window.XMLHttpRequest&&XMLHttpRequest.prototype&&XMLHttpRequest.prototype.addEventListener&&!/CriOS/.test(navigator.userAgent)){t("loader").features.xhr=!0;var o=t("handle"),i=t(2),a=t("ee"),s=["load","error","abort","timeout"],c=s.length,f=t(1);t(4),t(3),a.on("new-xhr",function(){this.totalCbs=0,this.called=0,this.cbTime=0,this.end=n,this.ended=!1,this.xhrGuids={}}),a.on("open-xhr-start",function(t){this.params={method:t[0]},r(this,t[1]),this.metrics={}}),a.on("open-xhr-end",function(t,e){"loader_config"in NREUM&&"xpid"in NREUM.loader_config&&this.sameOrigin&&e.setRequestHeader("X-NewRelic-ID",NREUM.loader_config.xpid)}),a.on("send-xhr-start",function(t,n){var r=this.metrics,o=t[0],i=this;if(r&&o){var f=e(o);f&&(r.txSize=f)}this.startTime=(new Date).getTime(),this.listener=function(t){try{"abort"===t.type&&(i.params.aborted=!0),("load"!==t.type||i.called===i.totalCbs&&(i.onloadCalled||"function"!=typeof n.onload))&&i.end(n)}catch(e){try{a.emit("internal-error",[e])}catch(r){}}};for(var u=0;c>u;u++)n.addEventListener(s[u],this.listener,!1)}),a.on("xhr-cb-time",function(t,e,n){this.cbTime+=t,e?this.onloadCalled=!0:this.called+=1,this.called!==this.totalCbs||!this.onloadCalled&&"function"==typeof n.onload||this.end(n)}),a.on("xhr-load-added",function(t,e){var n=""+f(t)+!!e;this.xhrGuids&&!this.xhrGuids[n]&&(this.xhrGuids[n]=!0,this.totalCbs+=1)}),a.on("xhr-load-removed",function(t,e){var n=""+f(t)+!!e;this.xhrGuids&&this.xhrGuids[n]&&(delete this.xhrGuids[n],this.totalCbs-=1)}),a.on("addEventListener-end",function(t,e){e instanceof XMLHttpRequest&&"load"===t[0]&&a.emit("xhr-load-added",[t[1],t[2]],e)}),a.on("removeEventListener-end",function(t,e){e instanceof XMLHttpRequest&&"load"===t[0]&&a.emit("xhr-load-removed",[t[1],t[2]],e)}),a.on("fn-start",function(t,e,n){e instanceof XMLHttpRequest&&("onload"===n&&(this.onload=!0),("load"===(t[0]&&t[0].type)||this.onload)&&(this.xhrCbStart=(new Date).getTime()))}),a.on("fn-end",function(t,e){this.xhrCbStart&&a.emit("xhr-cb-time",[(new Date).getTime()-this.xhrCbStart,this.onload,e],e)})}},{1:"XL7HBI",2:13,3:11,4:7,ee:"QJf3ax",handle:"D5DuLP",loader:"G9z0Bl"}],13:[function(t,e){e.exports=function(t){var e=document.createElement("a"),n=window.location,r={};e.href=t,r.port=e.port;var o=e.href.split("://");return!r.port&&o[1]&&(r.port=o[1].split("/")[0].split("#").pop().split(":")[1]),r.port&&"0"!==r.port||(r.port="https"===o[0]?"443":"80"),r.hostname=e.hostname||n.hostname,r.pathname=e.pathname,r.protocol=o[0],"/"!==r.pathname.charAt(0)&&(r.pathname="/"+r.pathname),r.sameOrigin=!e.hostname||e.hostname===document.domain&&e.port===n.port&&e.protocol===n.protocol,r}},{}],14:[function(t,e){function n(t){return function(){r(t,[(new Date).getTime()].concat(i(arguments)))}}var r=t("handle"),o=t(1),i=t(2);"undefined"==typeof window.newrelic&&(newrelic=window.NREUM);var a=["setPageViewName","addPageAction","setCustomAttribute","finished","addToTrace","inlineHit","noticeError"];o(a,function(t,e){window.NREUM[e]=n("api-"+e)}),e.exports=window.NREUM},{1:23,2:24,handle:"D5DuLP"}],gos:[function(t,e){e.exports=t("7eSDFh")},{}],"7eSDFh":[function(t,e){function n(t,e,n){if(r.call(t,e))return t[e];var o=n();if(Object.defineProperty&&Object.keys)try{return Object.defineProperty(t,e,{value:o,writable:!0,enumerable:!1}),o}catch(i){}return t[e]=o,o}var r=Object.prototype.hasOwnProperty;e.exports=n},{}],D5DuLP:[function(t,e){function n(t,e,n){return r.listeners(t).length?r.emit(t,e,n):(o[t]||(o[t]=[]),void o[t].push(e))}var r=t("ee").create(),o={};e.exports=n,n.ee=r,r.q=o},{ee:"QJf3ax"}],handle:[function(t,e){e.exports=t("D5DuLP")},{}],XL7HBI:[function(t,e){function n(t){var e=typeof t;return!t||"object"!==e&&"function"!==e?-1:t===window?0:i(t,o,function(){return r++})}var r=1,o="nr#id",i=t("gos");e.exports=n},{gos:"7eSDFh"}],id:[function(t,e){e.exports=t("XL7HBI")},{}],loader:[function(t,e){e.exports=t("G9z0Bl")},{}],G9z0Bl:[function(t,e){function n(){var t=l.info=NREUM.info;if(t&&t.licenseKey&&t.applicationID&&f&&f.body){s(h,function(e,n){e in t||(t[e]=n)}),l.proto="https"===p.split(":")[0]||t.sslForHttp?"https://":"http://",a("mark",["onload",i()]);var e=f.createElement("script");e.src=l.proto+t.agent,f.body.appendChild(e)}}function r(){"complete"===f.readyState&&o()}function o(){a("mark",["domContent",i()])}function i(){return(new Date).getTime()}var a=t("handle"),s=t(1),c=(t(2),window),f=c.document,u="addEventListener",d="attachEvent",p=(""+location).split("?")[0],h={beacon:"bam.nr-data.net",errorBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-593.min.js"},l=e.exports={offset:i(),origin:p,features:{}};f[u]?(f[u]("DOMContentLoaded",o,!1),c[u]("load",n,!1)):(f[d]("onreadystatechange",r),c[d]("onload",n)),a("mark",["firstbyte",i()])},{1:23,2:14,handle:"D5DuLP"}],23:[function(t,e){function n(t,e){var n=[],o="",i=0;for(o in t)r.call(t,o)&&(n[i]=e(o,t[o]),i+=1);return n}var r=Object.prototype.hasOwnProperty;e.exports=n},{}],24:[function(t,e){function n(t,e,n){e||(e=0),"undefined"==typeof n&&(n=t?t.length:0);for(var r=-1,o=n-e||0,i=Array(0>o?0:o);++r<o;)i[r]=t[e+r];return i}e.exports=n},{}],25:[function(t,e){function n(t){return!(t&&"function"==typeof t&&t.apply&&!t[i])}var r=t("ee"),o=t(1),i="nr#wrapper",a=Object.prototype.hasOwnProperty;e.exports=function(t){function e(t,e,r,a){function nrWrapper(){var n,i,s,f;try{i=this,n=o(arguments),s=r&&r(n,i)||{}}catch(d){u([d,"",[n,i,a],s])}c(e+"start",[n,i,a],s);try{return f=t.apply(i,n)}catch(p){throw c(e+"err",[n,i,p],s),p}finally{c(e+"end",[n,i,f],s)}}return n(t)?t:(e||(e=""),nrWrapper[i]=!0,f(t,nrWrapper),nrWrapper)}function s(t,r,o,i){o||(o="");var a,s,c,f="-"===o.charAt(0);for(c=0;c<r.length;c++)s=r[c],a=t[s],n(a)||(t[s]=e(a,f?s+o:o,i,s,t))}function c(e,n,r){try{t.emit(e,n,r)}catch(o){u([o,e,n,r])}}function f(t,e){if(Object.defineProperty&&Object.keys)try{var n=Object.keys(t);return n.forEach(function(n){Object.defineProperty(e,n,{get:function(){return t[n]},set:function(e){return t[n]=e,e}})}),e}catch(r){u([r])}for(var o in t)a.call(t,o)&&(e[o]=t[o]);return e}function u(e){try{t.emit("internal-error",e)}catch(n){}}return t||(t=r),e.inPlace=s,e.flag=i,e}},{1:24,ee:"QJf3ax"}]},{},["G9z0Bl",4,12,6,5]);</script>\r\n <meta name="viewport" content="width=device-width" />\r\n <meta name="fragment" content="!" />\r\n <title>Authors \xe2\x80\x93 Pluralsight Training</title>\r\n <link rel="stylesheet" href="//s.pluralsight.com/sc/css/app-a7dac6e6.css" />\r\n <link rel="stylesheet" href="//www.pluralsight.com/content/dist/css/fonts-a9675ca7.css" />\r\n <link href=\'//fonts.googleapis.com/css?family=Open+Sans:100,200,300,400,500,600,700,800,900\' rel=\'stylesheet\' type=\'text/css\'>\r\n <!--[if lte IE 9]>\r\n <link rel="stylesheet" href="//s.pluralsight.com/sc/css/ie-app-1bed8c68.css" />\r\n <![endif]-->\r\n <!--[if IE 8]>\r\n <link rel="stylesheet" href="//s.pluralsight.com/sc/css/ie8-62d3a852.css" />\r\n <![endif]-->\r\n <!--[if IE]>\r\n <link rel="stylesheet" href="//s.pluralsight.com/sc/css/ie-7dd5dc87.css" />\r\n <![endif]-->\r\n\r\n <script src="//s.pluralsight.com/sc/js/vendor/custom.modernizr-b4b7741a.js"></script>\r\n\r\n \r\n \r\n \r\n <script src="//cdn.optimizely.com/js/1252788015.js"></script>\r\n\r\n</head>\r\n<body>\r\n <!-- Google Tag Manager -->\r\n<noscript>\r\n <iframe src="//www.googletagmanager.com/ns.html?id=GTM-MNK9CB" height="0" width="0" style="display:none;visibility:hidden"></iframe>\r\n</noscript>\r\n<script>\r\n (function (w, d, s, l, i) {\r\n w[l] = w[l] || [];\r\n w[l].push({ \'gtm.start\': new Date().getTime(), event: \'gtm.js\' });\r\n var f = d.getElementsByTagName(s)[0],\r\n j = d.createElement(s),\r\n dl = l != \'dataLayer\' ? \'&l=\' + l : \'\';\r\n j.async = true;\r\n j.src = \'//www.googletagmanager.com/gtm.js?id=\' + i + dl;\r\n f.parentNode.insertBefore(j, f);\r\n })(window, document, \'script\', \'dataLayer\', \'GTM-MNK9CB\');\r\n</script>\r\n<!-- End Google Tag Manager -->\r\n\r\n <input type="hidden" id="pageObjectTag" value="AuthorsPage" />\r\n <div ng-controller="AuthenticationController">\r\n <div ng-include src="\'/header\'"></div>\r\n\r\n \r\n\r\n<!-- HERO UNIT -->\r\n<section class="teal-hex-bg hero">\r\n <div class="row">\r\n <div class="small-12 columns">\r\n <h1 class="medium">Our authors</h1>\r\n <h4 class="normal">Our original courses are authored by an elite group of tech and creative professionals, innovators and leaders. We take pride in only working with the best.</h4>\r\n <h5 class="authors-invite-to-teach"><strong>Want to join us?</strong></h5>\r\n <a class="teal button" href="/teach">Learn more</a>\r\n </div>\r\n </div>\r\n</section><!-- /HERO UNIT -->\r\n<!-- SECTION TITLE -->\r\n<section class="band" ng-controller="AuthorsController">\r\n\r\n <div class="row">\r\n <div class="small-12 columns">\r\n <div loading show="loading"></div>\r\n <div class="author-group" ng-cloak ng-repeat="(key, value) in authors">\r\n <p class="underline">{{key.toUpperCase()}}</p>\r\n <ul class="inline-list" >\r\n <li ng-repeat="author in value"><a class="panel" ng-href="/author/{{author.handle}}">{{author.fullName}}</a></li>\r\n </ul>\r\n </div>\r\n </div>\r\n </div>\r\n\r\n</section>\r\n\r\n </div>\r\n <footer ng-controller="FooterController">\r\n <div class="row">\r\n <!-- MAIN FOOTER STUFF -->\r\n <div class="large-4 columns">\r\n <img src="//s.pluralsight.com/sc/img/layout/logo-grey-v3.png" class="secondary-logo" />\r\n <p>\r\n Our mission is to publish high quality online training courses for professional developers, IT admins and creative artists. Every day.\r\n </p>\r\n <!-- facebook -->\r\n <a class="facebook social button" href="http://www.facebook.com/pluralsight" target="_blank" rel="nofollow">\r\n <span class="icon">\r\n <i class="social fi-social-facebook"></i>\r\n </span>\r\n Facebook\r\n <span ng-class="{\'number\': social.likes != undefined}" ng-cloak>{{social.likes | number}}</span>\r\n </a>\r\n <!-- twitter -->\r\n <a class="twitter social button" href="http://twitter.com/pluralsight" target="_blank" rel="nofollow">\r\n <span class="icon">\r\n <i class="social fi-social-twitter"></i>\r\n </span>\r\n Twitter\r\n <span ng-class="{\'number\': social.followers != undefined}" ng-cloak>{{social.followers | number}}</span>\r\n </a>\r\n <!-- google+ -->\r\n <a class="google social button" href="http://plus.google.com/+pluralsight" target="_blank" rel="nofollow">\r\n <span class="icon">\r\n <i class="social fi-social-google-plus"></i>\r\n </span>\r\n Google+\r\n <span ng-class="{\'number\': social.plusOnes != undefined}" ng-cloak>{{social.plusOnes | number}}</span>\r\n </a>\r\n <!-- newsletter -->\r\n <p>Subscribe to our newsletter for weekly updates.</p>\r\n <div class="row collapse signup-form">\r\n <form action="https://go.pardot.com/l/36882/2014-08-27/yj3h" method="POST">\r\n <div class="small-8 columns">\r\n <input type="text" placeholder="Email" name="UserInfo.Email" ng-focus="newsletterEmailFocus()" />\r\n </div>\r\n <div class="small-4 columns">\r\n <input class="button postfix" type="submit" name="submit" value="Submit">\r\n </div>\r\n </form>\r\n\r\n </div>\r\n </div>\r\n <!-- SITE MAP -->\r\n <div class="large-7 large-offset-1 columns">\r\n <div class="row">\r\n <div class="large-4 columns">\r\n <h5>Learn</h5>\r\n <ul class="side-nav">\r\n <li>Browse Courses</li>\r\n <li>Learning Paths</li>\r\n </ul>\r\n <h5>Products</h5>\r\n <ul class="side-nav">\r\n <li>Individual Plans</li>\r\n <li>Business Plans</li>\r\n <li>Free Trial</li>\r\n <li>Academic</li>\r\n <li>Government</li>\r\n </ul>\r\n </div>\r\n <div class="large-4 columns">\r\n <h5>Community</h5>\r\n <ul class="side-nav">\r\n <li>Free Kids Courses</li>\r\n <li>Official Blog</li>\r\n <li>Study Groups</li>\r\n <li>UG & Event Sponsorships</li>\r\n </ul>\r\n <h5>Support</h5>\r\n <ul class="side-nav">\r\n <li>Ask Support a Question</li>\r\n <li>Suggest a Course</li>\r\n <li>Support & Feedback</li>\r\n <li>Knowledge Base / FAQ</li>\r\n <li>Terms of Use</li>\r\n </ul>\r\n </div>\r\n <div class="large-4 columns">\r\n <h5>Features</h5>\r\n <ul class="side-nav">\r\n <li>Mobile Apps</li>\r\n <li>Offline Viewing</li>\r\n </ul>\r\n <h5>About</h5>\r\n <ul class="side-nav">\r\n <li>Contact Us</li>\r\n <li>Press Center</li>\r\n <li>About Us</li>\r\n <li>Authors</li>\r\n <li>Teach</li>\r\n <li>Jobs at Pluralsight</li>\r\n </ul>\r\n </div>\r\n </div>\r\n </div>\r\n </div>\r\n </footer>\r\n\r\n <script src="//s.pluralsight.com/sc/js/bundled/vendor-46f7b492.js"></script>\r\n <script src="//s.pluralsight.com/sc/js/bundled/app-d238c035.js"></script>\r\n\r\n\r\n <script type="text/javascript">\r\n pluralsightModule.factory(\'baseUrls\', function () {\r\n return {\r\n dataUrl: \'/data\',\r\n mvcUrl: \'//www.pluralsight.com/a\',\r\n mainWebUrl: \'//www.pluralsight.com/training\',\r\n staticCdnUrl: \'http://s.pluralsight.com\',\r\n staticUrl: \'//www.pluralsight.com\',\r\n contentUrl: \'//s.pluralsight.com/sc\'\r\n };\r\n })\r\n .factory(\'validationService\', function () {\r\n return {\r\n emailAddressPattern: \'/^[a-zA-Z0-9'._%+-]+#[a-zA-Z0-9-][a-zA-Z0-9.-]*\\.[a-zA-Z]{2,63}$/\'\r\n };\r\n })\r\n .factory(\'settingsProvider\', function ($resource) {\r\n return {\r\n featureToggleMarketoFormHandlers: String(false) == \'true\',\r\n featureToggleLinkedIn: String(false) == \'true\'\r\n };\r\n });\r\n \r\n\r\n </script>\r\n\r\n <script type="text/javascript">\r\n var hero = $(".hero") || $("header");\r\n hero.after(\'<div class="global-message-bar" ng-cloak ng-controller="MessageBarController" ng-show="hasMessage()"><div class="row"><div class="small-12 columns"><i class="fi-x"></i><span class="message-text">{{getMessage()}}</span></div></div></div>\');\r\n </script>\r\n\r\n \r\n</body>\r\n</html>\r\n'
Or use requests:
import requests
r = requests.get("http://www.pluralsight.com/authors")
print(r.content)
Related
Keyword argument repeated in python Flask
I'm trying to build restaurant list site using Flask. This is a part of my application.py code. #application.route("/list.html") def list_restaurants(): page = request.args.get("page", 0, type=int) limit = 4 category = request.args.get("category", "all") price = request.args.get("price", "all") area = request.args.get("area", "all") start_idx = limit*page end_idx = limit*(page+1) if category=="all" and price=="all" and area=="all": data = DB.get_restaurants() else: if category != "all" and price=="all" and area=="all": data = DB.get_restaurants_bycategory(category) elif price != "all" and category=="all" and area=="all": data = DB.get_restaurants_byprice(price) elif area != "all" and category=="all" and price=="all": data = DB.get_restaurants_byarea(area) else: data = DB.get_restaurants() tot_count = len(data) if tot_count<=limit: data = dict(list(data.items())[:tot_count]) else: data = dict(list(data.items())[start_idx:end_idx]) data = dict(sorted(data.items(), key=lambda x:x[1]['res_name'], reverse=False)) #print(data) page_count=len(data) return render_template( "list.html", datas=data.items(), total=tot_count, limit=limit, page=page, page_count=math.ceil(tot_counet/4), category=category, price=price, area=area) This is the python code calling the HTML page where the error is taking place. The HTML page (list.html): <!DOCTYPE html> <head> <meta charset="UTF-8" /> <title>search</title> <script src="https://code.jquery.com/jquery-latest.min.js"></script> <script src="{{ url_for('static', filename='main.js') }}" defer></script> <script src="https://code.jquery.com/jquery-latest.min.js"></script> <style src="{{ url_for('static', filename='index.css') }}"></style> </head> <body> <div class="header" id="logo" onclick="location.href='list.html'"> <img src="/static/YomoJomoLogo.png" width="150px" /> </div> <div class="contents"> <div class="searchbar"> <form> <div class="searchbox"> <a style="color: black;">검색</a> <input type="text" name="search" style="width: 80%; height: 30px;" placeholder="Search by restaurant name or menu name." /> <input type="button" name="search" onclick="location.href='search3.html'" name="search" value="search" /> </div> <div class="login"> <div></div> <input class="loginbutton" type="button" onclick="location.href='login.html'" name="login" value="login" /> <input class="regbutton" type="button" onclick="location.href='register_restaurant.html'" name="register" value="register" /> </div> </form> </div> <br /><br /><br /> <nav> <script> $(document).ready(function () { //alert("{{category}}"); $('#category option:contains("{{category}}")').prop('selected', true); }); </script> <div class="menu"> <ul> <li> <a>Category</a> <ul id="category" name="category" onchange="location=this.value"> <li> <a href="{{url_for('list_restaurants', page=i, category='Korean', price='all', area='all')}}" >Korean</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, category='Italian', price='all', area='all')}}" >Italian</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, category='Chinese', price='all', area='all')}}" >Chinese</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, category='Japanese', price='all', area='all')}}" >Japanese</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, category='Cafeteria', price='all', area='all')}}" >Cafeteria</a > </li> </ul> </li> <li> <a>Price</a> <ul id="price" name="price" onchange="location=this.value"> <li> <a href="{{url_for('list_restaurants', page=i, price='below 5', category='all', area='all')}}" >below 5</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, price='5-10', category='all', area='all')}}" >below 10</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, price='10-15', category='all', area='all')}}" >below 15</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, price='15-20', category='all', area='all')}}" >below 20</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, price='above 20', category='all', area='all')}}" >above 20</a > </li> </ul> </li> <li> <a>Area</a> <ul id="area" name="area" onchange="location=this.value"> <li> <a href="{{url_for('list_restaurants', page=i, area='school', category='all', price='all')}}" >school</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, area='front', category='all', price='all')}}" >front</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, area='back', category='all', price='all')}}" >back</a > </li> <li> <a href="{{url_for('list_restaurants', page=i, area='etc', category='all', price='all')}}" >etc</a > </li> </ul> </li> <li style="float: right;">random</li> </ul> </div> </nav> {% if total > 0 %} <p style="text-align: center;"> <br />restaurant list - {{total}}<br /><br /> </p> {% for data in datas %} <div style="float: left; width: 25%;"> <div style="text-align: center;"> <a href="/view_detail/{{data[1].res_name}}/"> <p style="color: black;">{{data[1].res_name}}</p> <img src="/static/image/{{data[1].img_path}}" width="200" /></a ><br /> </div> </div> {% endfor %} <!---pagenation--> <div class="page-wrap" style="clear: both;"> <br /><br /> <div class="page-nation"> <ul> <li> {% for i in range(page_count)%} {{i+1}} {% endfor %} </li> </ul> <br /><br /> </div> </div> {% else %} <p class="ranking"> Search Result </p> <div style="margin: 20px;"> <p style="text-align: center;">No result.<br /><br /></p> <div style=" float: left; margin-left: 150px; padding: 40px; border-radius: 5%; text-align: center; background-color: #f3f3f3; " > Register a new restaurant<br /><br /> <input type="button" onclick="location.href='register_restaurant.html'" name="register" style="height: 30px; background-color: #738b5f; border: none; color: white;" value="register a new restaurant" /> </div> <div style=" float: right; margin-right: 150px; padding: 40px; border-radius: 5%; text-align: center; background-color: #f3f3f3; " > Random recommendation<br /><br /> <input type="button" onclick="location.href='search5.html'" name="register" style="height: 30px; background-color: #738b5f; border: none; color: white;" value="random recommendation" /> </div> {% endif %} </div> </div> </body> This is the error code. Traceback (most recent call last) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2464, in __call__ return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2450, in wsgi_app response = self.handle_exception(e) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1867, in handle_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/workspace/flask/application.py", line 108, in list_restaurants area=area) File "/usr/local/lib/python3.7/site-packages/flask/templating.py", line 138, in render_template ctx.app.jinja_env.get_or_select_template(template_name_or_list), File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 930, in get_or_select_template return self.get_template(template_name_or_list, parent, globals) File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 883, in get_template return self._load_template(name, self.make_globals(globals)) File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 857, in _load_template template = self.loader.load(self, name, globals) File "/usr/local/lib/python3.7/site-packages/jinja2/loaders.py", line 127, in load code = environment.compile(source, name, filename) File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 636, in compile return self._compile(source, filename) File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 601, in _compile return compile(source, filename, "exec") File "/workspace/flask/templates/list.html", line 57 ^ SyntaxError: keyword argument repeated It keeps pointing the same line, not a specific part of the code. I tried adding blank lines on line 57, and it points the same line syntax error. The code was working well and suddenly it stopped. I have no idea how to deal with this 'keyword argument repeated' syntax error. Looking for some advices!
<input type="button" name="search" onclick="location.href='search3.html'" name="search" value="search" /> You're using name="..." twice.
Same results consecutif
I have a table : output _df (image in the question) . Ienter image description here repetition of the same value for "PCR POS/Neg" consecutive in my "output_df". If i have 3 results identiques consecutifs , more than 3 times in the output_df so i need to give an error message "WARNING" in my index.html How i can do it ? views.py from django.shortcuts import render from django.core.files.storage import FileSystemStorage import pandas as pd import datetime from datetime import datetime as td import os from collections import defaultdict from django.contrib import messages import re import numpy as np def home(request): #upload file and save it in media folder if request.method == 'POST': uploaded_file = request.FILES['document'] uploaded_file2 = request.FILES['document2'] if uploaded_file.name.endswith('.xls') and uploaded_file2.name.endswith('.txt'): savefile = FileSystemStorage() #save files name = savefile.save(uploaded_file.name, uploaded_file) name2 = savefile.save(uploaded_file2.name, uploaded_file2) d = os.getcwd() file_directory = d+'/media/'+name file_directory2 = d+'/media/'+name2 cwd = os.getcwd() print("Current working directory:", cwd) results,output_df,new =results1(file_directory,file_directory2) return render(request,"results.html",{"results":results,"output_df":output_df,"new":new}) else: messages.warning(request, ' File was not uploaded. Please use the correct type of file') return render(request, "index.html") #read file def readfile(uploaded_file): data = pd.read_excel(uploaded_file, index_col=None) return data def results1(file1,file2): results_list = defaultdict(list) names_loc = file2 listing_file = pd.read_excel(file1, index_col=None) headers = ['Vector Name', 'Date and Time', 'Test ID', 'PCR POS/Neg'] output_df = pd.DataFrame(columns=headers) with open(names_loc, "r") as fp: for line in fp.readlines(): line = line.rstrip("\\\n") full_name = line.split(',') sample_name = full_name[0].split('_mean') try: if len(re.split(r'(^[^\d]+)', sample_name[0])[2]) > 1: sample_id = int(re.split(r'(^[^\d]+)', sample_name[0])[2]) else: sample_id = int(re.split(r'(^[^\d]+)', sample_name[0])[2]) except: sample_id = sample_name[0] try: if listing_file['Test ID'].isin([sample_id]).any(): line_data = listing_file.loc[listing_file['Test ID'].isin([sample_id])] # The name of the file as it is shown in the folder vector_name = line # The data and the time of the taken sample d_t = full_name[1].split('us_')[1].split('_') date_time = td(int(d_t[0]), int(d_t[1]), int(d_t[2]), int(d_t[3]), int(d_t[4]), int(d_t[5])) # Calculating the time frame from the swap to test of samples date_index = list(line_data['Collecting Date from the subject'].iteritems()) for x in date_index: if type(x[1]) is str(): date_time_obj = td.strptime(x[1], '%Y.%m.%d. %H:%M') elif type(x[1]) is pd.Timestamp: date_time_obj = x[1] elif type(x[1]) is datetime.datetime: date_time_obj = x[1] frame_time = str(date_time - date_time_obj) if date_time - date_time_obj > datetime.timedelta(hours=48): results_list["List of samples with time frame over 48 :"].append(sample_id) # The Test ID as it writen in the listing file test_id = sample_id # The PCR answer as it was written in the listing file pcr_index = list(line_data['PCR Pos/Neg'].iteritems()) if len(pcr_index) > 1: results_list["List of Samples with more than one attribute in the listing file:"].append(sample_id) for x in pcr_index: pcr_ans = x[1].strip() values_to_add = {'Vector Name': vector_name, 'Date and Time': date_time, 'Test ID': test_id, 'PCR POS/Neg': pcr_ans, 'Time Frame': frame_time } row_to_add = pd.Series(values_to_add) output_df = output_df.append(row_to_add, ignore_index=True) else: results_list["List of Samples not in the listing file:"].append(sample_name[0]) except: print('The template name isnt good: {}'.format(sample_id)) output_df['Date and Time'] = pd.to_datetime(output_df['Date and Time']) new = output_df.groupby([output_df['Date and Time'].dt.date, 'PCR POS/Neg']).size().unstack(fill_value=0) return dict(results_list), output_df.to_html(), new.to_html() index.html <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/html"> <head> {% load static %} <link rel="stylesheet" type="text/css" href="{% static 'css/style.css' %}"/> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Bootstrap CSS --> <link href="https://cdn.jsdelivr.net/npm/bootstrap#5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous"> <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" rel="stylesheet"/> <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/> <link href='https://fonts.googleapis.com/css?family=Poppins' rel='stylesheet'> </head> <body id="ok" style=" width: 150; height: 100vh; background-size: cover;font-family: 'Poppins'; background-repeat:no-repeat; background-image: url('static/images/o.png'); "> <br> <br> <nav class="navbar navbar-expand-lg navbar-white " style=" border-radius: 25px;box-shadow: inset 0 0 5px grey; margin:2em;background-color:#EDF1F6 ; 350px;"> <img src="static/images/mi2.png" style=" width: 350px; " > <div class="container-fluid" style="text-align: center; margin: auto;"> <a class="navbar-brand" href="#"></a> <button class="navbar-toggler" style="color:#0D4171;padding: 1px 1px;" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation" > <span class="navbar-toggler-icon"></span>⇩</button> <div class="collapse navbar-collapse" id="navbarSupportedContent"> <ul class="navbar-nav" style="text-align: center; margin: auto;"> <form method="POST" enctype="multipart/form-data"> {% csrf_token %} <li class="nav-item" style="#DAE2EA" > <label class="btn btn-outline" style=" color: #0D4171;border-radius: 25px; font-size: 21px; text-align: center;"> <i class="fa fa-cloud-upload" style="font-size: 1.5em;"></i> <br>Listing files (.xls) <input type="file" name="document" id="document" required="required"> </label> <label class="btn btn-outline" style=" color: #0D4171; font-size: 21px;"> <i class="fa fa-cloud-upload" style="font-size: 1.5em;"></i> <br>File Names (.txt) <input type="file" name="document2" id="document2" required="required"> </label> <br> <div style="margin: auto;"> <br> <button class="btn" style="background-color: #0D4171; border: none; ;color: white; padding: 10px 25px; text-decoration: none; font-size: 16px;font:Poppins Medium; border-radius: 15px; margin-right:65px;" > Upload </button> </div> </li> </form> </ul> {% block messages %} {% if messages %} {% for message in messages %} {% endfor %} {% endif %} {% endblock %} </div> </div> </nav> {%block body%}{% endblock body%} <script src="https://cdn.jsdelivr.net/npm/bootstrap#5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script> </body> </html> <div class=""> <h1></h1> <p></p> <p></p> </div> {{variable}} </body> </html> results.html <!DOCTYPE html> <html lang="en"> <head> <link href='https://fonts.googleapis.com/css?family=Poppins' rel='stylesheet'> <meta charset="UTF-8"> <title> Dashboard Result</title> <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script> <script type="text/javascript"> $("#btnPrint").live("click", function () { var divContents = $("#dvContainer").html(); var printWindow = window.open('', '', 'height=100,width=200'); printWindow.document.write('<html><head><title> ListingCheckPdf</title>'); printWindow.document.write('</head><body >'); printWindow.document.write(divContents); printWindow.document.write('</body></html>'); printWindow.document.close(); printWindow.print(); }); </script> </head> <body id="ok" style="margin:0.5; padding:0.5em; width: auto;background-size: cover;font-family: 'Poppins'; height:auto; background-size: cover; background-repeat:no-repeat; background-image: url('static/images/o.png'); "> <nav class="navbar navbar-expand-lg navbar-white " style=" border-radius: 25px; margin:2em;background-color:#EDF1F6 ; "> <img src="static/images/mi3.png" style=" width: 250px; margin-left:1em;" > <form id="form1"> <br> <input type="button" style="background-color: #0D4171; border: none; color: white; padding: 8px 18px; text-align: center; text-decoration: none; display: inline-block; font-size: 15px; margin-left:30px; font-family: 'Poppins'; border-radius: 25px;" value="Download PDF" id="btnPrint" /><br><br><br> <div class="container-fluid" id="dvContainer" style="width: 900px;height: 900px; border-radius:15px; background-color:white; margin:auto; box-shadow: inset 0 0 5px grey;border-radius: 10px; overflow: scroll; /* showing scrollbars */" > <style>table, td, th { margin-left: auto; margin-right: auto; border: 1px solid black; width: 600px; text-align:center; align-items: center;} </style> <br> <div> {% autoescape off %}{{ new }}{% endautoescape %} </div><br><br> <div style="color: hidden; margin: 30px; font: Poppins; font-size: 17px"> {% for key, value in results.items %}<br> {{ key }}<br> {% for elem in value %} <div style="margin-left:50px" > - {{elem }} <br></div> {% endfor %} {% endfor %}</div><br><br><br><br><br>S <div>{% autoescape off %}{{ output_df }}{% endautoescape %}</div> </div> </form> </body> </html>
Why can't I scrape all data from ecommerce websites?
Actually I'm working on a project where I have to scrape data from e-commerce websites. But I can't access my desired data from these sites. For example, when I want to scrape all list from https://evaly.com.bd/search-results?query=remax%20610d site, I only get <li class="ais-InfiniteHits-sentinel"></li> as output. Besides, when I print HTML code of the site using print(soup.prettify()) The full code is not in the output. Here is my code for all list items : from bs4 import BeautifulSoup import requests link = "https://evaly.com.bd/search-results?query=remax%20610" source = requests.get( link).text soup = BeautifulSoup(source, 'lxml') #print(soup.prettify()) li = soup.find_all("li") print(li) And here is the output when I run print(soup.prettify()) : <!DOCTYPE html> <html> <head> <style data-styled="" data-styled-version="5.2.0"> .lfkzsQ{background-color:white;-webkit-letter-spacing:0.025em;-moz-letter-spacing:0.025em;-ms-letter-spacing:0.025em;letter-spacing:0.025em;font-weight:500;font-size:15px;height:46px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex:1;-ms-flex:1;flex:1;padding:0 17px;border:1px solid var(--primary);border-radius:6px 0 0 6px;outline:none;}/*!sc*/ #media (max-width:425px){.lfkzsQ{width:50%;min-width:50%;}}/*!sc*/ data-styled.g87[id="Searchbar__SeachInput-xnx3kr-0"]{content:"lfkzsQ,"}/*!sc*/ .jtCmJd{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;height:100%;border-radius:5px;overflow:hidden;background-color:#f6f6f6;}/*!sc*/ data-styled.g88[id="Searchbar__Container-xnx3kr-1"]{content:"jtCmJd,"}/*!sc*/ .BVXNH{cursor:pointer;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;padding-right:29px;padding-left:29px;background:var(--primary);color:#fff;}/*!sc*/ #media (max-width:425px){.BVXNH{padding-right:5px;padding-left:5px;}}/*!sc*/ data-styled.g90[id="Searchbar__Button-xnx3kr-3"]{content:"BVXNH,"}/*!sc*/ .XBQPS{font-size:25px;}/*!sc*/ #media (max-width:768px){.XBQPS{font-size:20px;}}/*!sc*/ data-styled.g92[id="Searchbar___StyledMdSearch-xnx3kr-5"]{content:"XBQPS,"}/*!sc*/ .jCIuWZ{display:grid;grid-template-columns:repeat(auto-fill,minmax(200px,1fr));grid-gap:1vw;}/*!sc*/ #media (max-width:768px){.jCIuWZ{grid-template-columns:repeat(auto-fill,minmax(150px,1fr));grid-gap:1vw;}}/*!sc*/ data-styled.g246[id="algoliaConnectComponent__GridP-sc-1c85asy-0"]{content:"jCIuWZ,"}/*!sc*/ .jmbKPm{width:100%;max-width:100px;min-width:0;height:32px;padding:0 16px;-webkit-appearance:none;-moz-appearance:none;appearance:none;background-color:#f5f5fa;font-size:12px;border-radius:4px;}/*!sc*/ data-styled.g247[id="algoliaConnectComponent___StyledInput-sc-1c85asy-1"]{content:"jmbKPm,"}/*!sc*/ .eZHEjD{width:100%;max-width:100px;min-width:0;height:32px;padding:0 16px;-webkit-appearance:none;-moz-appearance:none;appearance:none;background-color:#f5f5fa;font-size:12px;color:#5d6494;border-radius:4px;}/*!sc*/ data-styled.g248[id="algoliaConnectComponent___StyledInput2-sc-1c85asy-2"]{content:"eZHEjD,"}/*!sc*/ .gqxLmc{display:block;height:32px;margin-left:8px;padding-left:16px;padding-right:16px;background:linear-gradient(90deg,#f5515f 0%,#9f041b 100%);color:#fff;border-radius:4px;box-shadow:0 4px 11px 0 rgba(37,44,97,0.15),0 2px 3px 0 rgba(93,100,148,0.2);-webkit-transition:all 0.2s ease-out;transition:all 0.2s ease-out;}/*!sc*/ data-styled.g249[id="algoliaConnectComponent___StyledButton-sc-1c85asy-3"]{content:"gqxLmc,"}/*!sc*/ .gWgnak{display:grid;grid-template-columns:6% 10% auto 25%;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;grid-template-areas:"logo menu search notification";}/*!sc*/ #media (max-width:768px){.gWgnak{grid-template-columns:25% 25% 25% 25%;grid-template-areas:"menu logo logo user" "notification notification notification notification" "search search search search";}.gWgnak .logo{justify-self:center;margin-bottom:1rem;max-width:76px;width:100%;}.gWgnak .menu{position:relative;justify-self:left;}}/*!sc*/ data-styled.g253[id="search-results__GridContainer-sc-6ln6mm-1"]{content:"gWgnak,"}/*!sc*/ .jpeNuX{min-height:3rem;}/*!sc*/ data-styled.g254[id="search-results___StyledDiv-sc-6ln6mm-2"]{content:"jpeNuX,"}/*!sc*/ .ejWvfj{right:30px;bottom:30px;background:linear-gradient(90deg,#f5515f 0%,#9f041b 100%);}/*!sc*/ #media (max-width:767px){.ejWvfj{bottom:75px;}}/*!sc*/ data-styled.g255[id="search-results___StyledButton-sc-6ln6mm-3"]{content:"ejWvfj,"}/*!sc*/ </style> <link href="/static/manifest.json" rel="manifest"/> <title> E-valy Limited | Online Shopping Mall </title> <meta charset="utf-8"/> <meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no, maximum-scale=1.0, user-scalable=no" name="viewport"/> <meta content="E-valy Limited | Online Shopping Mall" property="og:title"/> <meta content="article" property="og:type"/> <meta content="https://s3-ap-southeast-1.amazonaws.com/media.evaly.com.bd/media/2019-08-04_090235.843922android-icon-200x200.png" property="og:image"/> <meta content="450" property="og:image:width"/> <meta content="298" property="og:image:height"/> <meta content="https://evaly.com.bd" property="og:url"/> <meta content="E-valy is an e-commerce site which will be capable of providing every kind of goods and products from every sector to every consumer located in Bangladesh." property="og:description"/> <link href="/static/images/icons/favicon.ico" rel="shortcut icon"/> <meta content="evaly://" property="al:android:url"/> <meta content="Evaly" property="al:android:app_name"/> <meta content="bd.com.evaly.evalymarchant" property="al:android:package"/> <meta content="14" name="next-head-count"/> <link as="style" href="/_next/static/css/d48fe9f040f8d2f97c7e.css" rel="preload"/> <link href="/_next/static/css/d48fe9f040f8d2f97c7e.css" rel="stylesheet"/> <link as="script" href="/_next/static/RZ7VftogY8QkgPiLg6BPz/pages/_app.js" rel="preload"/> <link as="script" href="/_next/static/RZ7VftogY8QkgPiLg6BPz/pages/search-results.js" rel="preload"/> <link as="script" href="/_next/static/runtime/webpack-6b3d3cda09a7b5b5debf.js" rel="preload"/> <link as="script" href="/_next/static/chunks/framework.7dfd02d307191d63a37e.js" rel="preload"/> <link as="script" href="/_next/static/chunks/b637e9a5.a705a21716e5b01f8145.js" rel="preload"/> <link as="script" href="/_next/static/chunks/0c9dcbbe.7fbd830a3d684b32423b.js" rel="preload"/> <link as="script" href="/_next/static/chunks/commons.afffbbb0420dd9af938a.js" rel="preload"/> <link as="script" href="/_next/static/chunks/6a597b002e9daab94e2e0adeb626acca4f1f6515.28c9d68d9749974f08e1.js" rel="preload"/> <link as="script" href="/_next/static/chunks/bba5516912876db85383b691379c4486ab998795.071cf6d38264238f2f49.js" rel="preload"/> <link as="script" href="/_next/static/runtime/main-3c89e50e2c7d7034f938.js" rel="preload"/> <link as="script" href="/_next/static/chunks/252f366e.32bec51017e26b1dae31.js" rel="preload"/> <link as="script" href="/_next/static/chunks/95b64a6e.a74dcc7937bf0c356811.js" rel="preload"/> <link as="script" href="/_next/static/chunks/d7eeaac4.afdce0938beabe8eef9a.js" rel="preload"/> <link as="script" href="/_next/static/chunks/2dc48ec14d05924f473dce007726385374c258b9.0a52afc0ae53472a590f.js" rel="preload"/> <link as="script" href="/_next/static/chunks/3ad14741d7bfb55e1bcea5bfc6670f090f0855af.b5af8ef4be1abd2d5791.js" rel="preload"/> <link as="script" href="/_next/static/chunks/f6d549f16f3909adbb4f9a302aacab15937bfbda.94c734c42c1caf61b869.js" rel="preload"/> <link as="script" href="/_next/static/chunks/a9dd91d4607a584382b3e8a70a910ee9fb417c65.cabb84905704185ea6f6.js" rel="preload"/> <link as="script" href="/_next/static/chunks/4cbc61372435748121077b3b94e57617b6c8338d.5ae2119035f5c9d8c81c.js" rel="preload"/> <link as="script" href="/_next/static/chunks/411365f484ca502253106aae57d21ae3bb416d15.2f90a1a0cb46996155b4.js" rel="preload"/> <link as="script" href="/_next/static/chunks/69ef8573555555a232f56c2d2a1de6a4101c15d0.d8f92afd6f8ceb35f607.js" rel="preload"/> <link as="script" href="/_next/static/chunks/5d7bf10f24bff82d5530a050de689a7c020a359b.36ce757546da64e3337c.js" rel="preload"/> <link as="script" href="/_next/static/chunks/c8a8012dbcfaeb41f17a667b3a927ba45766e4a2.312913bb8463128a068e.js" rel="preload"/> <link as="script" href="/_next/static/chunks/c1f80152d80b1129cab9e73f90501b8957be40a7.04f2303ad32c2682fab1.js" rel="preload"/> <link as="script" href="/_next/static/chunks/8d4460396e9219a79f33af22e0a8f4fe429b291e.cda426e58b75b281586e.js" rel="preload"/> <link as="script" href="/_next/static/chunks/57f045ed70322177467d785413f62aff844e25d2.ad35b737612878a9f01a.js" rel="preload"/> <link as="script" href="/_next/static/chunks/0378a7d7ac3f1a3f5f0e99380b068fe3a41b14e6.46f0a10d89a7db3593b1.js" rel="preload"/> <link as="script" href="/_next/static/chunks/680dd3e5bbe68ece4bf42804461f8830da8bd4e0.d71300269070cc46823a.js" rel="preload"/> </head> <body> <div id="__next"> <div class="jsx-2334610719 min-h-screen pb-2" style="background-color:#F7F8FA"> <div class="ais-InstantSearch__root"> <div class="topbar bg-gray-100 py-1 text-gray-600 hidden md:block"> <div class="container flex justify-between text-sm"> <div class="flex"> <div class="mr-4"> <a href="https://merchant.evaly.com.bd/"> <svg class="w-3 h-3 mr-1 inline align-baseline"> <use href="/static/images/icons.svg#shop" xlink:href="/static/images/icons.svg#shop"> </use> </svg> Merchant zone </a> </div> <div class="mr-4"> <a href="/feeds"> <svg class="w-3 h-3 mr-1 inline align-baseline"> <use href="/static/images/icons.svg#newsfeed" xlink:href="/static/images/icons.svg#newsfeed"> </use> </svg> News Feed </a> </div> <div class="mr-4"> <a href="https://play.google.com/store/apps/details?id=bd.com.evaly.evalyshop"> <svg class="w-3 h-3 mr-1 inline align-baseline"> <use href="/static/images/icons.svg#mobile" xlink:href="/static/images/icons.svg#mobile"> </use> </svg> Download App </a> </div> </div> <div class="flex"> <div class="mr-4"> <a href="https://www.facebook.com/groups/EvalyHelpDesk/"> <svg class="w-3 h-3 mr-1 inline align-baseline"> <use href="/static/images/icons.svg#help" xlink:href="/static/images/icons.svg#help"> </use> </svg> <!-- --> Help </a> </div> <div> <a href="https://www.facebook.com/evaly.com.bd/"> <svg class="w-3 h-3 mr-1 inline align-baseline"> <use href="/static/images/icons.svg#facebook" xlink:href="/static/images/icons.svg#facebook"> </use> </svg> <!-- --> Follow us </a> </div> </div> </div> </div> <div class="bg-white header" style="box-shadow:0 4px 16px 0 rgba(0,0,0,0.04)"> <div class="search-results__Container-sc-6ln6mm-0 hFUCjp container py-5 px-8"> <div class="search-results__GridContainer-sc-6ln6mm-1 gWgnak"> <a class="logo xs:w-1/2" href="/" style="grid-area:logo"> <img alt="logo" class="" src="/static/images/logo.svg" style="max-width:76px"/> </a> <button class="text-2xl menu md:block mb-4 md:mb-0" style="grid-area:menu"> <svg class="m-auto text-gray-700" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"> <path d="M3 18h18v-2H3v2zm0-5h18v-2H3v2zm0-7v2h18V6H3z"> </path> </svg> </button> <div class="md:hidden mb-4" style="grid-area:user;justify-self:right"> <button class="flex items-center"> <span class="flex w-full items-center text-gray-700"> <span> <svg color="#1D2531" fill="currentColor" height="25" size="25" stroke="currentColor" stroke-width="0" style="color:#1D2531" viewbox="0 0 1024 1024" width="25" xmlns="http://www.w3.org/2000/svg"> <path d="M858.5 763.6a374 374 0 0 0-80.6-119.5 375.63 375.63 0 0 0-119.5-80.6c-.4-.2-.8-.3-1.2-.5C719.5 518 760 444.7 760 362c0-137-111-248-248-248S264 225 264 362c0 82.7 40.5 156 102.8 201.1-.4.2-.8.3-1.2.5-44.8 18.9-85 46-119.5 80.6a375.63 375.63 0 0 0-80.6 119.5A371.7 371.7 0 0 0 136 901.8a8 8 0 0 0 8 8.2h60c4.4 0 7.9-3.5 8-7.8 2-77.2 33-149.5 87.8-204.3 56.7-56.7 132-87.9 212.2-87.9s155.5 31.2 212.2 87.9C779 752.7 810 825 812 902.2c.1 4.4 3.6 7.8 8 7.8h60a8 8 0 0 0 8-8.2c-1-47.8-10.9-94.3-29.5-138.2zM512 534c-45.9 0-89.1-17.9-121.6-50.4S340 407.9 340 362c0-45.9 17.9-89.1 50.4-121.6S466.1 190 512 190s89.1 17.9 121.6 50.4S684 316.1 684 362c0 45.9-17.9 89.1-50.4 121.6S557.9 534 512 534z"> </path> </svg> </span> </span> </button> </div> <div style="grid-area:search"> <form action="" novalidate="" role="search"> <div class="Searchbar__Container-xnx3kr-1 jtCmJd"> <input class="Searchbar__SeachInput-xnx3kr-0 lfkzsQ" placeholder="Search..." type="search" value="remax 610"/> <figure class="Searchbar__Button-xnx3kr-3 BVXNH" color="black"> <svg _css2=" #media (max-width: ,768px,) { , font-size:20px; , } " class="Searchbar___StyledMdSearch-xnx3kr-5 XBQPS" color="white" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" style="color:white" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"> <path d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3 9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5 4.99L20.49 19l-4.99-5zm-6 0C7.01 14 5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z"> </path> </svg> </figure> </div> </form> </div> <div class="md:pl-4 notification hidden md:block" style="grid-area:notification"> <div class="flex justify-between items-center mb-4 mx-16 md:mx-0 md:mb-0 lg:ml-8"> <button class="text-2xl menu md:hidden"> <svg class="m-auto" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"> <path d="M3 18h18v-2H3v2zm0-5h18v-2H3v2zm0-7v2h18V6H3z"> </path> </svg> </button> <button class="relative"> <svg color="#1D2531" fill="currentColor" height="25" size="25" stroke="currentColor" stroke-width="0" style="color:#1D2531" view How to solve these problems? EDIT : using Selenium and Chrome Driver will be more time consuming for my project
Try the below approach using requests and json. I have created the script with the API URL which is fetched by inspecting the network calls in chrome which are triggering on page load and then creating a dynamic form data to traverse on each and every page to get the data. What exactly script is doing: First script will create a form data to query the the API call where page_no, query string and max values per facet(numbers of results to show) are dynamic where parameter page_no will increment by 1 upon completion of each traversal. Requests will get the data from the created form data and URL using POST method which will then pass to JSON to parse it and load in json format. Then from the parsed data script will traverse on the json object where data is actually present. Finally looping on all the batch of each and every page data one by one and printing. Right now script is displaying few information you can access more information form the json object like i have done below. import json import requests from urllib3.exceptions import InsecureRequestWarning requests.packages.urllib3.disable_warnings(InsecureRequestWarning) from bs4 import BeautifulSoup as bs def scrap_evaly_data(): QUERY = 'remax%20610' #query string can be changed to fetch another product data MAX_VALUES_PER_FACET = 10 #no. of result show per page page_no = 0 # default page no. URL = 'https://eza2j926q5-3.algolianet.com/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react%20(16.13.1)%3B%20react-instantsearch%20(5.7.0)%3B%20JS%20Helper%20(2.28.1)&x-algolia-application-id=EZA2J926Q5&x-algolia-api-key=ca9abeea06c16b7d531694d6783a8f04' # API URL for querying while True: print('Hold on creating new form data...') form_data = { "requests":[{"indexName":"products","params":"query=" + QUERY + "&maxValuesPerFacet=" + str(MAX_VALUES_PER_FACET) + "&page=" + str(page_no) + "&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&facets=%5B%22price%22%2C%22category_name%22%2C%22brand_name%22%2C%22shop_name%22%2C%22color%22%5D&tagFilters="}] } # form_data which is dynamic and creates new set of results and send back response = requests.post(URL,json = form_data,verify = False) #requests for data using POST and JSON form data print('Created new form data going to fetch data...') result = json.loads(response.text) #load json data result if len(result) == 0: #condition to check whether result has length or not if not then break and come out from the while loop. break else: for item in result['results'][0]['hits']: #loop on the product information JSON object print('-' * 100) print('Brand Name: ', item['brand_name']) print('Category Name: ' , item['category_name']) print('Discount Price: ' , item['discounted_price']) print('Max Price: ' , item['max_price']) print('Min Price: ' , item['min_price']) print('Product Name: ' , item['name']) print('Product Image: ' , item['product_image']) print('Shop Item ID: ' , item['shop_item_id']) print('Shop Name: ' , item['shop_name']) print('Slug Info: ' , item['slug']) print('-' * 100) page_no +=1 #Increment the page number by 1 after each traversal scrap_evaly_data()
Python + Selenium: How to switch to overlay
driver.find_element_by_class_name("lnkClassInfo").click() time.sleep(2) element = driver.find_element_by_css_selector("#popup input[value='BOOK THIS CLASS NOW']") driver.execute_script("arguments[0].click();", element) ERROR: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#popup input[value='BOOK THIS CLASS NOW']"} Line 1 of my code allows me to click into the above time-slot on the Main Page, which triggers an Overlay to pop up. My goal is to click the class-booking button on the overlay. Based on my understanding, Python needs to switch to the overlay/iframe. Please advise on how to code this for my attempts have been unsuccessful so far. Relevant Main Page HTML: <script type="text/javascript"> $(document).ready(function () { setPageScroll('dashboard'); $(".club-selections a").click(function () { $("#ctl00_cphContents_ddlClub").val($(this).attr('rel')); $("form:first").submit(); }); $(".schedule-selection a").click(function () { $("#ctl00_cphContents_ddlSchedule").val($(this).attr('rel')); $("form:first").submit(); }); $(".club-selections a[rel=" + $("#ctl00_cphContents_ddlClub").val() + "]").addClass('selected'); $(".schedule-selection a[rel=" + $("#ctl00_cphContents_ddlSchedule").val() + "]").addClass('selected'); $(".lnkClassInfo").click(function () { $.colorbox({ href: $(this).attr("href"), title: $(this).attr("title"), transition: 'fade', iframe: true, width: 566, height: 600, fixed: true }); return false; }); var FIREFOX = /Firefox/i.test(navigator.userAgent); if (FIREFOX) { $('.tbl-wrapper .tbl-container').scroll(function () { $('title').html($(this).scrollLeft()); $(this).find('a.lnkClassInfo').css('margin-right', $(this).scrollLeft() + "px"); }); } }); </script> <a href='popup/class-info.aspx?tcl_id=307632' title='Class Info.' style='line-height:74px; height:74px; top:413px;' class='lnkClassInfo'><span class='class-name'>ICE II</span><span>- Zaf -<br />11:30 - 12:30 PM<br /><img src='https://trueclassbooking.com.sg/userfiles/class-tags/cover.jpg' alt='Cover' title='Cover' style='height:16px;' /></span></a> Overlay HTML: <html xmlns="http://www.w3.org/1999/xhtml"> <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="theme-color" content="#592954" /><meta name="viewport" content="width=1024, user-scalable=yes" /><title> CBSS System - Class Info. </title> <link href="https://trueclassbooking.com.sg/member/css/style.all.css?v=2.22.03" rel="stylesheet" type="text/css" /> <!--[if IE]> <link href="https://trueclassbooking.com.sg/member/css/ie.css?v=2.22.03" rel="stylesheet" type="text/css" /> <![endif]--> <!--[if IE 8]> <link href="https://trueclassbooking.com.sg/member/css/ie8.css?v=2.22.03" rel="stylesheet" type="text/css" /> <![endif]--> </head> <body> <form name="aspnetForm" method="post" action="class-info.aspx?tcl_id=307632" id="aspnetForm"> <div> <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" /> <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" /> </div> <script type="text/javascript"> //<![CDATA[ var theForm = document.forms['aspnetForm']; if (!theForm) { theForm = document.aspnetForm; } function __doPostBack(eventTarget, eventArgument) { if (!theForm.onsubmit || (theForm.onsubmit() != false)) { theForm.__EVENTTARGET.value = eventTarget; theForm.__EVENTARGUMENT.value = eventArgument; theForm.submit(); } } //]]> </script> <div> <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="4B353318" /> <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgKm0IbODAKt8aSHB8YJdcVegLzMqZZLYYPUOKYp+jf1" /> </div> <!-- MESSENGER --> <div id="message-wrapper" class="popup-message-wrapper" title="Click to hide."></div> <!-- CONTENTS --> <div id="popup"> <div class="class-info"> <div class="header"> <div class="left al"> <div class="time">11:30 - 12:30 PM</div> </div> <div class="right ar" style='display:none;'> <span style="color:#b91be0"></span> SLOTS <span style="color:#999999;">|</span> <span style="color:#cc0066"></span> SLOTS AVAILABLE </div> </div> <div class="header"> <div class="left al"> <div class="class-name">ICE II - Zaf</div> </div> <div class="right ar"> <a id="ctl00_cphContents_btnBook" class="btn-gradient" href="javascript:__doPostBack('ctl00$cphContents$btnBook','')">BOOK THIS CLASS NOW</a> </div> </div> <hr /> <div class="description"> <div class="header"> <div class="left"> ICE II </div> <div class="right"> <img src='https://trueclassbooking.com.sg/userfiles/class-tags/cover.jpg' alt='Cover' title='Cover' style='height:16px;' /> </div> </div> <p><span style="font-size:24px"><strong>Indoor Cycling Experience (I.C.E.) I</strong></span><br /> Highly recommended for those who want a solid foundation in bike set-up & cycling technique. A high energy workout, this indoor cycling workout is paced with light and music settings to create a vibrant atmosphere.<br /> <br /> Push your limits and get intense through warm-up, sprints, climbs and cool-down segments , with changing body positions, pedal speed and resistance.<br /> <br /> <span style="font-size:24px"><strong>Indoor Cycling Experience (I.C.E.) II</strong></span><br /> This programme has proven to be one of the most well-received fat-loss programmes thus far! Simple yet hyper-challenging!<br /> <br /> <span style="font-size:24px"><strong>Indoor Cycling Experience (I.C.E.) III</strong></span><br /> Expectations from participants as well as instructors alike are astronomical! Must have had regular training with Spin II to attempt this class. </p> </div> </div> </div> </form> <script type="text/javascript" src="https://trueclassbooking.com.sg/member/js/script.all.js?v=2.22.03"></script> <!-- JAVSCRIPTS --> <script> (function (i, s, o, g, r, a, m) { i['GoogleAnalyticsObject'] = r; i[r] = i[r] || function () { (i[r].q = i[r].q || []).push(arguments) }, i[r].l = 1 * new Date(); a = s.createElement(o), m = s.getElementsByTagName(o)[0]; a.async = 1; a.src = g; m.parentNode.insertBefore(a, m) })(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga'); ga('create', 'UA-45242383-1', 'trueclassbooking.com.sg'); ga('send', 'pageview'); </script> <!-- EXCEPTIONS --> </body> </html>
You should not switch to overlay as it's just a simple modal window (div node). And also target element doesn't looks like an input, but a link <a id="ctl00_cphContents_btnBook" class="btn-gradient" href="javascript:__doPostBack('ctl00$cphContents$btnBook','')">BOOK THIS CLASS NOW</a> Try below to wait for link to appear and click it: from selenium.webdriver.support.ui import WebDriverWait as wait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC driver.find_element_by_class_name("lnkClassInfo").click() wait(driver, 5).until(EC.element_to_be_clickable((By.LINK_TEXT, "BOOK THIS CLASS NOW"))).click() Here you can find more info about what is iframe and how it might looks like...
How to prettify HTML so tag attributes will remain in one single line?
I got this little piece of code: text = """<html><head></head><body> <h1 style=" text-align: center; ">Main site</h1> <div> <p style=" color: blue; text-align: center; ">text1 </p> <p style=" color: blueviolet; text-align: center; ">text2 </p> </div> <div> <p style="text-align:center"> <img src="./foo/test.jpg" alt="Testing static images" style=" "> </p> </div> </body></html> """ import sys import re import bs4 def prettify(soup, indent_width=4): r = re.compile(r'^(\s*)', re.MULTILINE) return r.sub(r'\1' * indent_width, soup.prettify()) soup = bs4.BeautifulSoup(text, "html.parser") print(prettify(soup)) The output of the above snippet right now is: <html> <head> </head> <body> <h1 style=" text-align: center; "> Main site </h1> <div> <p style=" color: blue; text-align: center; "> text1 </p> <p style=" color: blueviolet; text-align: center; "> text2 </p> </div> <div> <p style="text-align:center"> <img alt="Testing static images" src="./foo/test.jpg" style=" "/> </p> </div> </body> </html> I'd like to figure out how to format the output so it becomes this instead: <html> <head> </head> <body> <h1 style="text-align: center;"> Main site </h1> <div> <p style="color: blue;text-align: center;"> text1 </p> <p style="color: blueviolet;text-align: center;"> text2 </p> </div> <div> <p style="text-align:center"> <img alt="Testing static images" src="./foo/test.jpg" style=""/> </p> </div> </body> </html> Said otherwise, I'd like to keep html statements such as <tag attrib1=value1 attrib2=value2 ... attribn=valuen> in one single line if possible. When I say "if possible" I mean without screwing up the value of the attributes themselves (value1, value2, ..., valuen). Is this possible to achieve with beautifulsoup4? As far I've read in the docs it seems you can use a custom formatter but I don't know how I could have a custom formatter so it can accomplish the described requirements. EDIT: #alecxe solution is quite simple, unfortunately fails in some more complex cases like the below one, ie: test1 = """ <div id="dialer-capmaign-console" class="fill-vertically" style="flex: 1 1 auto;"> <div id="sessionsGrid" data-columns="[ { field: 'dialerSession.startTime', format:'{0:G}', title:'Start time', width:122 }, { field: 'dialerSession.endTime', format:'{0:G}', title:'End time', width:122, attributes: {class:'tooltip-column'}}, { field: 'conversationStartTime', template: cty.ui.gct.duration_dialerSession_conversationStartTime_endTime, title:'Duration', width:80}, { field: 'dialerSession.caller.lastName',template: cty.ui.gct.person_dialerSession_caller_link, title:'Caller', width:160 }, { field: 'noteType',template:cty.ui.gct.nameDescription_noteType, title:'Note type', width:150, attributes: {class:'tooltip-column'}}, { field: 'note', title:'Note'} ]"> </div> </div> """ from bs4 import BeautifulSoup import re def prettify(soup, indent_width=4, single_lines=True): if single_lines: for tag in soup(): for attr in tag.attrs: print(tag.attrs[attr], tag.attrs[attr].__class__) tag.attrs[attr] = " ".join( tag.attrs[attr].replace("\n", " ").split()) r = re.compile(r'^(\s*)', re.MULTILINE) return r.sub(r'\1' * indent_width, soup.prettify()) def html_beautify(text): soup = BeautifulSoup(text, "html.parser") return prettify(soup) print(html_beautify(test1)) TRACEBACK: dialer-capmaign-console <class 'str'> ['fill-vertically'] <class 'list'> Traceback (most recent call last): File "d:\mcve\x.py", line 35, in <module> print(html_beautify(test1)) File "d:\mcve\x.py", line 33, in html_beautify return prettify(soup) File "d:\mcve\x.py", line 25, in prettify tag.attrs[attr].replace("\n", " ").split()) AttributeError: 'list' object has no attribute 'replace'
BeautifulSoup tried to preserve the newlines and multiple spaces you had in the attribute values in the input HTML. One workaround here would be to iterate over the element attributes and clean them up prior to prettifying - removing the newlines and replacing multiple consecutive spaces with a single space: for tag in soup(): for attr in tag.attrs: tag.attrs[attr] = " ".join(tag.attrs[attr].replace("\n", " ").split()) print(soup.prettify()) Prints: <html> <head> </head> <body> <h1 style="text-align: center;"> Main site </h1> <div> <p style="color: blue; text-align: center;"> text1 </p> <p style="color: blueviolet; text-align: center;"> text2 </p> </div> <div> <p style="text-align:center"> <img alt="Testing static images" src="./foo/test.jpg" style=""/> </p> </div> </body> </html> Update (to address the multi-valued attributes like class): You just need to add a slight modification adding special handling for the case when an attribute is of a list type: for tag in soup(): tag.attrs = { attr: [" ".join(attr_value.replace("\n", " ").split()) for attr_value in value] if isinstance(value, list) else " ".join(value.replace("\n", " ").split()) for attr, value in tag.attrs.items() }
While BeautifulSoup is more commonly used, HTML Tidy may be a better choice if you're working with quirks and have more specific requirements. After installing the library for Python (pip install pytidylib) try the following code: from tidylib import Tidy tidy = Tidy() # assign string to text config = { "doctype": "omit", # "show-body-only": True } print tidy.tidy_document(text, options=config)[0] tidy.tidy_document returns a tuple with the HTML and any errors that may have occurred. This code will output <html> <head> <title></title> </head> <body> <h1 style="text-align: center;"> Main site </h1> <div> <p style="color: blue; text-align: center;"> text1 </p> <p style="color: blueviolet; text-align: center;"> text2 </p> </div> <div> <p style="text-align:center"> <img src="./foo/test.jpg" alt="Testing static images" style=""> </p> </div> </body> </html> By uncommenting the "show-body-only": True for the second sample. <div id="dialer-capmaign-console" class="fill-vertically" style="flex: 1 1 auto;"> <div id="sessionsGrid" data-columns="[ { field: 'dialerSession.startTime', format:'{0:G}', title:'Start time', width:122 }, { field: 'dialerSession.endTime', format:'{0:G}', title:'End time', width:122, attributes: {class:'tooltip-column'}}, { field: 'conversationStartTime', template: cty.ui.gct.duration_dialerSession_conversationStartTime_endTime, title:'Duration', width:80}, { field: 'dialerSession.caller.lastName',template: cty.ui.gct.person_dialerSession_caller_link, title:'Caller', width:160 }, { field: 'noteType',template:cty.ui.gct.nameDescription_noteType, title:'Note type', width:150, attributes: {class:'tooltip-column'}}, { field: 'note', title:'Note'} ]"></div> </div> See more configuration for further options and customization. There are wrapping options specific to attributes which may help. As you can see, empty elements will only take up one line, and html-tidy will automatically try to add things like DOCTYPE, head and title tags.