Basically i am following this tutorial: http://blog.jimmy.schementi.com/2010/03/pycon-2010-python-in-browser.html
According to it, this code should run fine:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<script type="text/javascript"
src="http://gestalt.ironpython.net/dlr-20100305.js"></script>
<script type="text/python" src="http://github.com/jschementi/pycon2010/raw/master/repl.py"></script>
</head>
<body>
<script type="text/python">
window.Alert("Hello from Python!")
</script>
</body>
</html>
And in fact, it does, for example here: http://ironpython.net/browser/examples/pycon2010/start.html
You will see it if you have silverlight installed.
But the problem is that when I try to make the same code run on my PC, I can't do it. I create a text file, copy this code there, save it as test.html, and run with firefox, but nothing happens. Code does not execute, i just get a blank page.
I can't understand the reason why the same code runs here: http://ironpython.net/browser/examples/pycon2010/start.html, but not on my PC, given that it is a client side code, and not the server side.
It's failing to download repl.py; looks like a bug as it's falling back to the DOM downloader when doing cross-domain downloads, but throws. As a work-around copy it to your web-server as well; here's it working: http://www.schementi.com/silverlight/Sunny88.html.
Also, locally you must run under a local web-server as Silverlight isn't able to download any files from the http:// zone while running from the file:// zone.
Related
I am trying to get the source code of a particular site with the help of Selenium with:
Python code:
driver.page_source
But it returns it after it has been encoded.
The raw file:
<html>
<head>
<title>AAAAAAAA</title>
</head>
<body>
</body>
When press 'View page source' inside Chrome, I saw the correct source raw without encoding.
How can this be achieved?
You can try using Javascript instead of Python builtin code to get the page source.
javascriptPageSource = driver.execute_script("return document.body.outerHTML;")
i have in my html file script location like
<script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js">
now when i am using this in flask render_template("test.html"), i think it's not able to load these files over the internet which is why i don't see it loaded properly. what's the way to do in flask so that i can load all js, css over internet loaded properly.
please help.
thanks a lot,
Sudip
Remove the defer tag, and put the script tag in the header. That makes sure it will load before you need it. When deferred, your page loaded first, so the font was not available yet:
<head>
<script src="https://use.fontawesome.com/releases/v5.0.8/js/all.js">
</head>
I need a url to be opened in a IE browser specifically.
I know the following code is wrong but I don't know what else to try.
How can I achieve that through python?
template
Open in IE
urls.py
url(r'^open-ie$', views.open_in_ie, name='ie'),
views.py
import webbrowser
def open_in_ie(request):
ie = webbrowser.get(webbrowser.iexplore)
return ie.open('https://some-link.com')
Again, I know this is wrong and it tries to open the ie browser at a server level. Any advices? Thank you!
Shot Answer: You can't.
Long Answer: If user is using IE to view your website you can open links in other browsers. But if user is using any other browser(firefox, chrome, etc.) all links will open in same browser, you can't access other browsers. So in your case answer is no, because you are trying to open IE from some other browser.
Here is the code to open another browser from IE if you are interested:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<title>HTA Test</title>
<hta:application applicationname="HTA Test" scroll="yes" singleinstance="yes">
<script type="text/javascript">
function openURL()
{
var shell = new ActiveXObject("WScript.Shell");
shell.run("http://www.google.com");
}
</script>
</head>
<body>
<input type="button" onclick="openURL()" value="Open Google">
</body>
</html>
Code from here
I'm trying to grab the content of the following url:
https://docs-05-dot-polymer-project.appspot.com/0.5/articles/demos/spa/final.html
My goal is to grab the content (source code) of the webpage as seen by the visitor, so after it has rendered all javascripts etc.
To do so I used the example mentioned here:http://techstonia.com/scraping-with-phantomjs-and-python.html
That example works on my server. But the challenge is to also have it work for polymer based SPA sites like the one mentioned. Those are really rendered javascript websites.
My code looks like:
import platform
from bs4 import BeautifulSoup
from selenium import webdriver
# PhantomJS files have different extensions
# under different operating systems
if platform.system() == 'Windows':
PHANTOMJS_PATH = './phantomjs.exe'
else:
PHANTOMJS_PATH = './phantomjs'
# here we'll use pseudo browser PhantomJS,
# but browser can be replaced with browser = webdriver.FireFox(),
# which is good for debugging.
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('https://docs-05-dot-polymer-project.appspot.com/0.5/articles/demos/spa/final.html')
print (browser)
The issue is that is delivers the following result:
<!DOCTYPE html>
<html><head>
<meta charset="utf-8">
<meta content="width=device-width, minimum-scale=1.0, initial-scale=1.0, user-scalable=yes" name="viewport">
<title>Single page app using Polymer</title>
<script async="" src="//www.google-analytics.com/analytics.js"></script><script src="/webcomponents.min.js"></script>
<!-- vulcanized version of imported elements --
see "elements.html" for unvulcanized list of imports. -->
<link href="vulcanized.html" rel="import">
<link href="styles.css" rel="stylesheet" shim-shadowdom="">
</link></link></meta></meta></head>
<body fullbleed="" unresolved="">
<template id="t" is="auto-binding">
<!-- Route controller. -->
<flatiron-director autohash="" route="{{route}}"></flatiron-director>
<!-- Keyboard nav controller. -->
<core-a11y-keys id="keys" keys="up down left right space space+shift" on-keys-pressed="{{keyHandler}}" target="{{parentElement}}"></core-a11y-keys>
<core-scaffold id="scaffold">
<nav>
<core-toolbar>
<span>Single Page Polymer</span>
</core-toolbar>
<core-menu on-core-select="{{menuItemSelected}}" selected="{{route}}" selectedmodel="{{selectedPage}}" valueattr="hash">
<template repeat="{{page, i in pages}}">
<paper-item hash="{{page.hash}}" noink="">
<core-icon icon="label{{route != page.hash ? '-outline' : ''}}"></core-icon>
{{page.name}}
</paper-item>
</template>
</core-menu>
</nav>
<core-toolbar flex="" tool="">
<div flex="">{{selectedPage.page.name}}</div>
<core-icon-button icon="refresh"></core-icon-button>
<core-icon-button icon="add"></core-icon-button>
</core-toolbar>
<div center-center="" fit="" horizontal="" layout="">
<core-animated-pages id="pages" on-tap="{{cyclePages}}" selected="{{route}}" transitions="slide-from-right" valueattr="hash">
<template repeat="{{page, i in pages}}">
<section center-center="" hash="{{page.hash}}" layout="" vertical="">
<div>{{page.name}}</div>
</section>
</template>
</core-animated-pages>
</div>
</core-scaffold>
</template>
<script src="app.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-43475701-2', 'auto'); // ebidel's
ga('create', 'UA-39334307-1', 'auto'); // pp.org
ga('send', 'pageview');
</script>
</body></html>
As you see far from the real result you see when looking with your browser.
The questions I have.... What do I do wrong and if possible where to look for the solution.
I think you are missing something from the Selenium Webdriver docs.
You can get the content of a dynamic page, but you have to make sure that the element you are searching is present and visible on the page:
import platform
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get('https://docs-05-dot-polymer-
project.appspot.com/0.5/articles/demos/spa/final.html')
# Getting content of the first slide
res1 = browser.find_element_by_xpath('//*[#id="pages"]/section[1]/div')
# Save a screenshot so you can see why is failing (if it is)
browser.save_screenshot('screen_test')
# Print the text within the div
print (res1.text)
If you need to get also the text of the other slides, you need to click (using the webdriver) where needs to make visible the second slide, before getting the text from it.
I want to scrab a web page, so I try to download all: images, .js elements and also .css elemnts. To download .cc script I wrote a function:
for item in self.soup.findAll('link', {'type':'text/css','href':True}):
print item['href']
# do some things
And it usually works quite well but I found some pages, for which it doesn't work and I can not understand why. For example a page: http://www.nasa.gov. If I will open this page in my browser and save as a file, I can notice that inside a source I have:
<link media="all" href="NASA_files/widget120.css" type="text/css" rel="stylesheet">
<link media="screen" rel="stylesheet" href="NASA_files/sayt.css" type="text/css">
and few more. But when I run my code it doesn't print anything. The question is: what am I doing wrong?
If you run your code on just the HTML you posted, it works.
It's not working if you fetch nasa.gov in your script because the actual source of that page does not include those elements. There are a bunch of inline <style> elements with #includes in them. The <link> elements are probably added using Javascript after the page is loaded.