How to keep Python from treating string like HTML in web2py? - python

I'm working in web2py and I'm trying to print out html code from the controller, which is written in python. The issue is even when I write the html in a string in python, the page is rendering this string as it would normal html. This seems like there would be a simple fix, but I have not been able to find an answer. Here is the specific code.
return ('Here is the html I'm trying to show: <img src= {0}>'.format(x))
The resulting page shows "Here is the html I'm trying to show: " and then the rest is blank. If I inspect the page the rest of the code is still there, which means it is being read, just not displayed. So I just need a way to keep the html that is in the string from being interpreted as html. Any ideas?

If you want to send HTML markup but have the browser treat it and display it as plain text, then simply set the HTTP Content-Type header appropriately. For example, in the web2py controller:
def myfunc():
...
response.headers['Content-Type'] = 'text/plain'
return ("Here is the html I'm trying to show: <img src={0}>".format(x))
On the other hand, if you want the browser to treat and render the response as HTML and you care only about how it is displayed in the browser (but not about the actual text characters in the returned content), you can simply escape the HTML markup. web2py provides the xmlescape function for this purpose:
def myfunc():
x = '/static/myimage.png'
html = xmlescape("<img src={0}>".format(x))
return ("Here is the html I'm trying to show: {0}>".format(html))
The above will return the following to the browser:
Here is the html I'm trying to show: <img src=/static/myimage.png>
which the browser will display as:
Here is the html I'm trying to show: <img src=/test/image.png>
Note, if you instead use a web2py template to generate the response, any HTML markup inserted will automatically be escaped. For example, you could have a myfunc.html template like the following:
{{=markup}}
And in the controller:
def myfunc():
...
return dict(markup="Here is the html I'm trying to show: <img src={0}>".format(x))
In that case, web2py will automatically escape the content inserted via {{=markup}} (so no need to explicitly call xmlescape).

I take it you are trying to view this string in a web browser.
To take the raw html and not have the browser render it, you can wrap it in <xmp> tags:
return ("Here is the html I'm trying to show: <xmp><img src= {0}></xmp>".format(x))

Related

trying to download full HTML pages

I am tring to download few hundreds of HTML pages in order to parse them and calculate some measures.
I tried it with linux WGET, and with a loop of the following code in python:
url = "https://www.camoni.co.il/411788/168022"
html = urllib.request.urlopen(url).read()
but the html file I got doen't contain all the content I see in the browser in the same page. for example text I see on the screen is not found in the HTML file. only when I right click the page in the browser and "Save As" i get the full page.
the problem - I need a big anount of pages and can not do it by hand.
URL example - https://www.camoni.co.il/411788/168022 - thelast number changes
thank you
That's because that site is not static. It uses JavaScript (in this example jQuery lib) to fetch additional data from server and paste on page.
So instead of trying to GET raw HTML you should inspect requests in developer tools. There's a POST request on https://www.camoni.co.il/ajax/tabberChangeTab with such data:
tab_name=tab_about
memberAlias=ד-ר-דינה-ראלט-PhD
currentURL=/411788/ד-ר-דינה-ראלט-PhD
And the result is HTML that pasted on page after.
So instead of trying to just download page you should inspect page and requests to get data or use headless browser such as Google Chrome to emulate 'Save As' button and save data.

How to set jinja value as html tag

I am working on a flask blog app
In this, I am using MySQL and getting the data from aap.py in posts=posts
In index.html I am getting the value of post_title, slug, and content.
But what I want is the content, to be inside HTML tags, so I wrote content in HTML tags in Database and retrieve it in the index page, But the HTML tags are not working because it is getting the value from jinja into "". So the HTML tags are not processed!
For e.g
content = <h1>Hello</h1><br> (not a bar just explaining)
pushing content to MySQL database
retrieving content in index.html inside a p tag using jinja variable
But the result is
<p>"<h1>Hello</h1><br>"</p>
but the output is Hello
and I want Hello
Please help me out
By using js
a p tag which with id p is already in the HTML document
using js
document.getElementById("p").html = the HTML you want to insert
It worked for me

Requests won't get the text from web page?

I am trying to get the value of VIX from a webpage.
The code I am using:
raw_page = requests.get("https://www.nseindia.com/live_market/dynaContent/live_watch/vix_home_page.htm").text
soup = BeautifulSoup(raw_page, "lxml")
vix = soup.find("span",{"id":"vixIdxData"})
print(vix.text)
This gives me:
' '
If I see vix,
<span id="vixIdxData" style=" font-size: 1.8em;font-weight: bold;line-height: 20px;">/span>
On the site the element has text,
<span id="vixIdxData" style=" font-size: 1.8em;font-weight: bold;line-height: 20px;">15.785/span>
The 15.785 value is what I want to get by using requests.
The data you're looking for, is not available in the page source. And requests.get(...) gets you only the page source without the elements that are dynamically added through JavaScript. But, you can still get it using requests module.
In the Network tab, inside the developer tools, you can see a file named VixDetails.json. A request is being sent to https://www.nseindia.com/live_market/dynaContent/live_watch/VixDetails.json, which returns the data in the form of JSON.
You can access it using the built-in .json() function of the requests module.
r = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/VixDetails.json')
data = r.json()
vix_price = data['currentVixSnapShot'][0]['CURRENT_PRICE']
print(vix_price)
# 15.7000
When you open the page in a web browser, the text (e.g., 15.785) is inserted into the span element by the getIndiaVixData.js script.
When you get the page using requests in Python, only the HTML code is retrieved and no JavaScript processing is done. So, the span element stays empty.
It is impossible to get that data by solely parsing the HTML code of the page using requests.

Requests gets dashes, while the same webpage gives me the page with all details? [duplicate]

I am trying to get the value of VIX from a webpage.
The code I am using:
raw_page = requests.get("https://www.nseindia.com/live_market/dynaContent/live_watch/vix_home_page.htm").text
soup = BeautifulSoup(raw_page, "lxml")
vix = soup.find("span",{"id":"vixIdxData"})
print(vix.text)
This gives me:
' '
If I see vix,
<span id="vixIdxData" style=" font-size: 1.8em;font-weight: bold;line-height: 20px;">/span>
On the site the element has text,
<span id="vixIdxData" style=" font-size: 1.8em;font-weight: bold;line-height: 20px;">15.785/span>
The 15.785 value is what I want to get by using requests.
The data you're looking for, is not available in the page source. And requests.get(...) gets you only the page source without the elements that are dynamically added through JavaScript. But, you can still get it using requests module.
In the Network tab, inside the developer tools, you can see a file named VixDetails.json. A request is being sent to https://www.nseindia.com/live_market/dynaContent/live_watch/VixDetails.json, which returns the data in the form of JSON.
You can access it using the built-in .json() function of the requests module.
r = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/VixDetails.json')
data = r.json()
vix_price = data['currentVixSnapShot'][0]['CURRENT_PRICE']
print(vix_price)
# 15.7000
When you open the page in a web browser, the text (e.g., 15.785) is inserted into the span element by the getIndiaVixData.js script.
When you get the page using requests in Python, only the HTML code is retrieved and no JavaScript processing is done. So, the span element stays empty.
It is impossible to get that data by solely parsing the HTML code of the page using requests.

How to get HTML with expanded containers using Python?

When I get HTML of the page, e.g
response = urllib2.urlopen('http://www.wunderground.com/us/fl/miami/precipitation')
html = response.read()
I get HTML with collapsed containers, e.g
<h2>6-Hour Precipitation Forecast</h2>
<div id="precip-statement"></div>
<div id="precip-graph">
while the real HTML looks like that:
Clearly, I need to extract 6-hour forecast, which I cannot do having it collapsed into <div id="precip-statement"></div>
I will be very thankful if you can help me with this issue. Thank you
The content is loaded dynamically using ajax. You can sniff this request with Chrome. Press F12 -> Network -> XHR and look at requests, one of them (wwir.json) returns a nice json that you can parse using:
import json
weather = json.loads(response)
It looks like they use API key from api.weather.com, which probably means you should get your own.

Categories

Resources