Django template error while rendering a unicode string

Django template error while rendering a unicode string - python

I am passing a python list to a django template like this:
in views.py:
dat = [-77.448599999999999, 37.536900000000003, u'Virginia War Memorial', 1.0]
result={"dat":dat}
return render(request,'result_map.html', {'form': form,'result':result})
and in my template:
var dat = {{ result.dat }}
but in the rendered html I get:
var dat = [-77.448599999999999, 37.536900000000003, u'Virginia War Memorial', 1.0]
which gives an error.
How can I get:
var dat = [-77.448599999999999, 37.536900000000003, 'Virginia War Memorial', 1.0]
thanks

I think you need to pass it through the safe filter if you want the raw string:
{{ result.dat|safe }}
You can also use the {% autoescape off %} block if you want to affect a larger block of the template.

var dat = {{ result.dat }}
You're trying to inject into JavaScript, presumably in a <script> block. You need output that is formatted in JavaScript syntax, but you're implicitly converting a Python list to a string, which will give a Python-syntax list literal, not a JavaScript one. There are a number of differences, including text (Unicode) strings in the list being prefixed with u in Python 2 but not JS.
You're also using the default autoescaping for Django templates, which is HTML-escaping. That would be right for injecting into HTML content, but you're injecting into JavaScript, so you get wrong escapes like ' for '. You can prevent this by adding the |safe filter to avoid the HTML-escaping, but then you have no escaping at all, so your page will be vulnerable to script injection (cross-site scripting).
The right way to output a structured value into a <script> block is:
json.dumps to turn it into a JSON string
replace any instances of the character < in the string with the JSON string literal escape \u003C, so that the string </script> can't be used to escape from the <script> block and inject HTML
mark as safe.
You can combine these operations into a custom filter. Unfortunately Django itself doesn't ship one.
A better approach is to avoid injecting into JavaScript at all. Instead write the json.dumps output into the HTML document using the standard escaping and read it from there using DOM:
return render(request, 'result_map.html', {'form': form, 'dat': json.dumps(dat)})
...
<body data-dat="{{ dat }}">
...
var dat= JSON.parse(document.body.getAttribute('data-dat'));
this allows you to keep your JS completely out of your HTML pages, which would also allow you to use more secure Content-Security-Policy settings.

Related

How to replace line breaks (\n) with <br> for textual data to html content

I am loading data from a database. The textual data has line breaks \n as uploaded by the user, I would like to print this data on a html page. However line breaks do not occur. I have tried replacing \n with <br>, but that prints with <br> as part of the string instead of actually breaking the line.
How I replace
value['description'].replace('\n', '<br>')
How it appears:

I have realized mine is a framework specific issue as guided by #Demian Wolf's comment, my html markup was getting escaped. Since I am using django framework, adding a safe call in was the solution.
{{ value.description | safe }}

You can use the linebreaksbr templatetag
{{ value|linebreaksbr }}
or in your view
from django.template.defaultfilters import linebreaksbr
linebreaksbr(value['description'])

replace some html content with values

I want to send a verification email. The email consists of HTML. I saved the HTML into a file named email_templates/verify.html (path). The problem is, that there are some constants in the HTML file are unknown until runtime. For instance, in the email, I refer to the username to which I send my email, but since each email is referring to someone else, I can't include the name in the template. One solution that comes to mind is to use some formatting technique in the lines of
<div>
hello {usrname}!
<div>
and then in the python code do something like:
lines = open('email_templates/verify.html', 'r').read()
lines.format('joe')
But this code, although is, in fact, can work, has some issues:
every {} in the HTML file can be a mistake to be formatted
the code in the current form is not very readable
code is not elegant
for an HTML reader that don't know python the formatting placeholders will be confusing
Is there is any better way to approach this?

This can and should be done through templating.
As you mentioned that maybe python placeholders will be confusing but I tell you they are not confusing, templating engines make sure HTML looks like HTML and these template tags look like template tags. Templating engines lay down the rules which placeholders you can and can't use. Also they are way fast than the file opening method you suggested; because they are optimized to do so.
Let's understand by example:
There are several templating engines out there. Jinja2 is one of the best ones.
First, install Jinja2.
pip install jinja2
Second, create a python file(name it anything you want) and a folder named 'templates'. Under 'templates' folder create your verify.html
Your folder structure should look like this:
folder1
|
|--> pythonfile.py
|--> templates
|
|--> verify.html
Third, put some sample code in the HTML file. I have this example put in my verify.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Index</title>
</head>
<body>
<h1>Dear {{ user }}!</h1>
<h4>
Hope you are fine.
</h4>
<p>
Thank you for signing up. Here is your {{ coupon_code }}
</p>
</body>
</html>
Now in this html file you see I have normal html tags. But there are two sets of curly braces occurring twice. The word written inside the curly braces will be considered a variable by jinja. The value of this variable will be supplied by our python file to this html file.
Also, to be consistent, jinja doesn't allow you to just use any braces. I mean if I had put "<>" instead of "{{ }}" it would not have worked. So there are some rules to be followed.
Read more here: Jinja allowed tags and filters
Fourth, copy this code into the python file we created.
#Imports
from jinja2 import Environment, FileSystemLoader, Template
#name of the folder where index file is located.
file_loader = FileSystemLoader('templates')
#This object is needed to create a template object.
env = Environment(loader=file_loader)
#path of the HTML file reletive to the folder.
template = env.get_template('./index.html')
#Data dictionary to be supplied to our HTML file.
input_dict = {
'user': 'Harry',
'coupon_code': '12313ASDSA4'}
#This function renders the data substituted HTML form.
output = template.render(input_dict)
print(output)
Now run this python file.

How to replace '\n' in string with '<br>' in Flask app?

I've been trying to modify a string before passing it to my HTML page in Flask (replacing occurrences of '\n' with '<br>'), but the typical methods I use aren't working for some reason.
finalstring = textstring.replace('\n', '<br>')
return render_template('my-form-result.html', emailresponse = finalstring)
This should work, but for some reason, nothing is replaced. How can I get this to work? Thanks!

A better way to replace \n in HTML is using CSS styles.
Your replace() is alright. Debug your code and make sure there is \n before replace().
To be able to view linebreaks in HTML you should use safe filter in the template. But beware that you become open to XSS attacks. To get round this problem you should escape the string before replacing the \n character. This is the code:
from flask import escape
...
...
safe_html = str(escape(text)).replace('\n', '<br/>')
return render_template('[HTML file].html', safe_html=safe_html)
---------
#in the template:
<span> {{ safe_html | safe }} </span>
If you don't use the str() call before replace, then the <br/> will be scaped too. Because the return value from escape() is not string.

Disclaimer: I never worked with Flask, I just looked it up and hope it does what you want to do.
So somewhere in your template my-form-result.html you should find a line containing:
{{ emailresponse }}
You can replace this with:
{% for line in emailresponse.split('\n') %}
{{ line }}
<br />
{% endfor %}
To add an br after every newline

Your replace() code is correct. Make sure you escape the HTML in the template:
{{ emailresponse|safe }}
To diagnose, try this:
finalstring = textstring.replace('\n', '<br>')
print(finalstring)
return render_template('my-form-result.html', emailresponse = finalstring)
Also, show us the source code from the web page, to see what is actually rendering in the template

Textarea event handling with Python3

I wonder if someone can help me. I'm trying to do something like the following to get input events from a HTML text box and send them to a python function.
textarea = cgi.FieldStorage()
chars = textarea.getvalue('1')
def MyPythonFunction():
'Do something with chars'
print(<textarea oninput=MyPythonFunction() </textarea>)
I've tried all kinds of things but can't get it to work.
Thanks in advance

First, the oninput keyword of the textarea HTML tag expects JavaScript code and would interpret mypythonfunction to be a JavaScript function. You need to output an HTML form that contains a SUBMIT tag such that when the form is submitted it invokes your server-side script: the form might look like:
<form name="f" action="my_script.py" method="post">
<textarea name="chars"></textarea>
<br>
<input type="submit" name="submit" value="Submit">
</form>
Your server side script, my_script.py, which must be executable, might look like:
#!/usr/bin/env python
import cgi
import cgitb
cgitb.enable()
form = cgi.FieldStorage()
chars = form.getvalue('chars')
If you really wanted to process input a character at a time, then you would remove the SUBMIT HTML tag, put back the oninput keyword. But then you would have to specify a JavaScript function that would get invoked whenever the contents of the textarea changed. This function would have to use a technique called AJAX to invoke your server-side Python script passing the contents of the TEXTAREA as an argument. This is a rather advanced topic, but you can investigate this.

How do I perform HTML decoding/encoding using Python/Django?

I have a string that is HTML encoded:
'''<img class="size-medium wp-image-113"\
style="margin-left: 15px;" title="su1"\
src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg"\
alt="" width="300" height="194" />'''
I want to change that to:
<img class="size-medium wp-image-113" style="margin-left: 15px;"
title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg"
alt="" width="300" height="194" />
I want this to register as HTML so that it is rendered as an image by the browser instead of being displayed as text.
The string is stored like that because I am using a web-scraping tool called BeautifulSoup, it "scans" a web-page and gets certain content from it, then returns the string in that format.
I've found how to do this in C# but not in Python. Can someone help me out?
Related
Convert XML/HTML Entities into Unicode String in Python

With the standard library:
HTML Escape
try:
from html import escape # python 3.x
except ImportError:
from cgi import escape # python 2.x
print(escape("<"))
HTML Unescape
try:
from html import unescape # python 3.4+
except ImportError:
try:
from html.parser import HTMLParser # python 3.x (<3.4)
except ImportError:
from HTMLParser import HTMLParser # python 2.x
unescape = HTMLParser().unescape
print(unescape(">"))

Given the Django use case, there are two answers to this. Here is its django.utils.html.escape function, for reference:
def escape(html):
"""Returns the given HTML with ampersands, quotes and carets encoded."""
return mark_safe(force_unicode(html).replace('&', '&').replace('<', '&l
t;').replace('>', '>').replace('"', '"').replace("'", '''))
To reverse this, the Cheetah function described in Jake's answer should work, but is missing the single-quote. This version includes an updated tuple, with the order of replacement reversed to avoid symmetric problems:
def html_decode(s):
"""
Returns the ASCII decoded version of the given HTML string. This does
NOT remove normal HTML tags like <p>.
"""
htmlCodes = (
("'", '''),
('"', '"'),
('>', '>'),
('<', '<'),
('&', '&')
)
for code in htmlCodes:
s = s.replace(code[1], code[0])
return s
unescaped = html_decode(my_string)
This, however, is not a general solution; it is only appropriate for strings encoded with django.utils.html.escape. More generally, it is a good idea to stick with the standard library:
# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)
As a suggestion: it may make more sense to store the HTML unescaped in your database. It'd be worth looking into getting unescaped results back from BeautifulSoup if possible, and avoiding this process altogether.
With Django, escaping only occurs during template rendering; so to prevent escaping you just tell the templating engine not to escape your string. To do that, use one of these options in your template:
{{ context_var|safe }}
{% autoescape off %}
{{ context_var }}
{% endautoescape %}

For html encoding, there's cgi.escape from the standard library:
>> help(cgi.escape)
cgi.escape = escape(s, quote=None)
Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.
For html decoding, I use the following:
import re
from htmlentitydefs import name2codepoint
# for some reason, python 2.5.2 doesn't have this one (apostrophe)
name2codepoint['#39'] = 39
def unescape(s):
"unescape HTML code refs; c.f. http://wiki.python.org/moin/EscapingHtml"
return re.sub('&(%s);' % '|'.join(name2codepoint),
lambda m: unichr(name2codepoint[m.group(1)]), s)
For anything more complicated, I use BeautifulSoup.

Use daniel's solution if the set of encoded characters is relatively restricted.
Otherwise, use one of the numerous HTML-parsing libraries.
I like BeautifulSoup because it can handle malformed XML/HTML :
http://www.crummy.com/software/BeautifulSoup/
for your question, there's an example in their documentation
from BeautifulSoup import BeautifulStoneSoup
BeautifulStoneSoup("Sacré bleu!",
convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]
# u'Sacr\xe9 bleu!'

In Python 3.4+:
import html
html.unescape(your_string)

See at the bottom of this page at Python wiki, there are at least 2 options to "unescape" html.

Daniel's comment as an answer:
"escaping only occurs in Django during template rendering. Therefore, there's no need for an unescape - you just tell the templating engine not to escape. either {{ context_var|safe }} or {% autoescape off %}{{ context_var }}{% endautoescape %}"

If anyone is looking for a simple way to do this via the django templates, you can always use filters like this:
<html>
{{ node.description|safe }}
</html>
I had some data coming from a vendor and everything I posted had html tags actually written on the rendered page as if you were looking at the source.

I found a fine function at: http://snippets.dzone.com/posts/show/4569
def decodeHtmlentities(string):
import re
entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")
def substitute_entity(match):
from htmlentitydefs import name2codepoint as n2cp
ent = match.group(2)
if match.group(1) == "#":
return unichr(int(ent))
else:
cp = n2cp.get(ent)
if cp:
return unichr(cp)
else:
return match.group()
return entity_re.subn(substitute_entity, string)[0]

Even though this is a really old question, this may work.
Django 1.5.5
In [1]: from django.utils.text import unescape_entities
In [2]: unescape_entities('<img class="size-medium wp-image-113" style="margin-left: 15px;" title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" alt="" width="300" height="194" />')
Out[2]: u'<img class="size-medium wp-image-113" style="margin-left: 15px;" title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" alt="" width="300" height="194" />'

I found this in the Cheetah source code (here)
htmlCodes = [
['&', '&'],
['<', '<'],
['>', '>'],
['"', '"'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()
def htmlDecode(s, codes=htmlCodesReversed):
""" Returns the ASCII decoded version of the given HTML string. This does
NOT remove normal HTML tags like <p>. It is the inverse of htmlEncode()."""
for code in codes:
s = s.replace(code[1], code[0])
return s
not sure why they reverse the list,
I think it has to do with the way they encode, so with you it may not need to be reversed.
Also if I were you I would change htmlCodes to be a list of tuples rather than a list of lists...
this is going in my library though :)
i noticed your title asked for encode too, so here is Cheetah's encode function.
def htmlEncode(s, codes=htmlCodes):
""" Returns the HTML encoded version of the given string. This is useful to
display a plain ASCII text string on a web page."""
for code in codes:
s = s.replace(code[0], code[1])
return s

You can also use django.utils.html.escape
from django.utils.html import escape
something_nice = escape(request.POST['something_naughty'])

Below is a python function that uses module htmlentitydefs. It is not perfect. The version of htmlentitydefs that I have is incomplete and it assumes that all entities decode to one codepoint which is wrong for entities like &NotEqualTilde;:
http://www.w3.org/TR/html5/named-character-references.html
NotEqualTilde; U+02242 U+00338 ≂̸
With those caveats though, here's the code.
def decodeHtmlText(html):
"""
Given a string of HTML that would parse to a single text node,
return the text value of that node.
"""
# Fast path for common case.
if html.find("&") < 0: return html
return re.sub(
'&(?:#(?:x([0-9A-Fa-f]+)|([0-9]+))|([a-zA-Z0-9]+));',
_decode_html_entity,
html)
def _decode_html_entity(match):
"""
Regex replacer that expects hex digits in group 1, or
decimal digits in group 2, or a named entity in group 3.
"""
hex_digits = match.group(1) # '
' -> unichr(10)
if hex_digits: return unichr(int(hex_digits, 16))
decimal_digits = match.group(2) # '' -> unichr(0x10)
if decimal_digits: return unichr(int(decimal_digits, 10))
name = match.group(3) # name is 'lt' when '<' was matched.
if name:
decoding = (htmlentitydefs.name2codepoint.get(name)
# Treat &GT; like >.
# This is wrong for &Gt; and &Lt; which HTML5 adopted from MathML.
# If htmlentitydefs included mappings for those entities,
# then this code will magically work.
or htmlentitydefs.name2codepoint.get(name.lower()))
if decoding is not None: return unichr(decoding)
return match.group(0) # Treat "&noSuchEntity;" as "&noSuchEntity;"

This is the easiest solution for this problem -
{% autoescape on %}
{{ body }}
{% endautoescape %}
From this page.

Searching the simplest solution of this question in Django and Python I found you can use builtin theirs functions to escape/unescape html code.
Example
I saved your html code in scraped_html and clean_html:
scraped_html = (
'<img class="size-medium wp-image-113" '
'style="margin-left: 15px;" title="su1" '
'src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" '
'alt="" width="300" height="194" />'
)
clean_html = (
'<img class="size-medium wp-image-113" style="margin-left: 15px;" '
'title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" '
'alt="" width="300" height="194" />'
)
Django
You need Django >= 1.0
unescape
To unescape your scraped html code you can use django.utils.text.unescape_entities which:
Convert all named and numeric character references to the corresponding unicode characters.
>>> from django.utils.text import unescape_entities
>>> clean_html == unescape_entities(scraped_html)
True
escape
To escape your clean html code you can use django.utils.html.escape which:
Returns the given text with ampersands, quotes and angle brackets encoded for use in HTML.
>>> from django.utils.html import escape
>>> scraped_html == escape(clean_html)
True
Python
You need Python >= 3.4
unescape
To unescape your scraped html code you can use html.unescape which:
Convert all named and numeric character references (e.g. >, >, &x3e;) in the string s to the corresponding unicode characters.
>>> from html import unescape
>>> clean_html == unescape(scraped_html)
True
escape
To escape your clean html code you can use html.escape which:
Convert the characters &, < and > in string s to HTML-safe sequences.
>>> from html import escape
>>> scraped_html == escape(clean_html)
True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django template error while rendering a unicode string - python

I think you need to pass it through the safe filter if you want the raw string: {{ result.dat|safe }} You can also use the {% autoescape off %} block if you want to affect a larger block of the template.

Related

How to replace line breaks (\n) with <br> for textual data to html content

replace some html content with values

How to replace '\n' in string with '<br>' in Flask app?

Textarea event handling with Python3

How do I perform HTML decoding/encoding using Python/Django?

Categories

Resources