Detecting newline character on user's input (web2py) - python

I have the following table:
db.define_table('comm',
Field('post','reference post', readable=False, writable=False),
Field('body','text', requires=IS_NOT_EMPTY()),
auth.signature
)
and in a python function, the following code:
form=SQLFORM(db.comm).process()
I call that form in the returned view by the python function
{{=form}}
The problem is when the user inputs two or more paragraphs, it doesn't detect the newline character. How can I fix that?

Use pre tag to display the content in which you want to detect newline character.
<pre>
The HTML pre element (or HTML Preformatted Text) represents
preformatted text. Text within this element is typically displayed in
a non-proportional ("monospace") font exactly as it is laid out in the
file. Whitespace inside this element is displayed as typed.
{{for post in comments:}}
<pre>{{=post.body}}</pre>
{{pass}}

Assuming you are referring to the subsequent display of the user input in a view, you could use a <pre> tag: http://www.w3schools.com/tags/tag_pre.asp. However, you might need some CSS to get the font/styling you like (by default, the browser will use alternative styling with a fixed-width font).
You could also replace the newlines with <br> tags:
{{=XML(record.body.replace('\n', '<br>'), sanitize=True, permitted_tags=['br/'])}}
Because the text now contains <br> HTML tags, it is necessary to wrap it in XML() to prevent web2py from escaping the HTML -- but you also want to sanitize the text and allow only <br> tags to prevent malicious code from being executed.

Related

Add HTML linebreaks with Python regex

I need to add HTML linebreaks (<br />) to a string at all line endings which are not followed by a blank line. This simple pattern works:
body = re.sub(r'(.)\n(.)', r'\1<br />\2', body)
But I realized it will not work for an edge case where a line contains only a single character (because the character would have to be part of two different overlapping matches). So I tried the following pattern with lookaround subpatterns:
body = re.sub(r'(?<=.)\n(?=.)', r'<br />', body)
This works as intended, except that the HTML tag is added after the linebreak (\n), and with an additional linebreak:
linebreak
<br/>
!
<br/>
linebreak
<br/>
l
<br/>
works
I would expect that the matched linebreak is substituted by the HTML tag (thereby effectively removing the linebreaks from all matching areas) – why does the tag appear on a new line instead (i.e. increasing the number of linebreaks/lines)?
The equivalent pattern in vim does remove the linebreaks:
s:\(.\)\zs\n\ze\(.\):\<br \/\>:ge
This is quite embarrassing – the pattern/my script do indeed work as supposed to. I was fooled by an HTML source viewer which obviously adds linebreaks to the source code it should display unaltered. Sorry for taking your valuable time.

Error in HTML escaping with Jinja

I have the following regex that searches through text and prepends and appends HTML 'a' tags for the matched substring. It successfully does everything I want except when the HTML is escaped by using the 'safe' filter by Jinja. The regex is below:
re.sub('(^#\w*|(?<=\s)#\w*)',
r'\1',
'here is some #text with a #hashtag')
The above should come out here is some #text with a #hashtag
where '#text' and '#hashtag' are clickable links. However by using Jinja's 'safe' filter it comes out
"here is some "#text" with a "#hashtag
There are a few things to note:
Unmatched substrings are being wrapped in quotations
The html links should come out #hashtag<a> not <a href="{{ url_for(\'main.tag\', tagname=tag) }}">#hashtag
I'm confident it has to do with the string that is being processed by Jinja. I am not confident with how I am escaping specific characters in the string and passing it to Jinja to process.
Am I escaping the characters wrong? Thoughts? Thank you in advance.

Process whitespaces from user input

The only reason why I include Python in the question is that PHP has the nl2br function that inserts br tags, a similar function in Python could be useful, but I suspect that this problem can be solved with HTML and CSS.
So, I have a form that receives user`s input in a textarea. I save it to the database, which is Postgres and then when I display it, it doesn't include the line breaks the user supplied to separate paragraphs.
I tried using the white-space CSS property on the paragraph tag:
white-space: pre
or
white-space: pre-wrap
But, this is weird, the result was separated lines but the first line aligned in the middle:
including text-align:left didn't solve the problem. I'm sure there is a simple solution to this.
I would suggest to replace newline characters with <br /> either before storing it in the database (only once) or when fetching it (see comment).
With Python:
import re
myUserInput = re.sub('(?:\r\n|\r|\n)', '<br />', myUserInput)
With JavaScript (see jsfiddle):
myUserInput = myUserInput.replace(/(?:\r\n|\r|\n)/g, '<br />');

Django security. dealing with user input . Is html.strip_tags enough or should I use bleach?

I'm accepting user input on a small forum I have. This is what I do with user's input:
First, call "html.strip_tags" from django.utils.html on user's cleaned_data[input].
Save it to the database. Postgre.
Query the text and use a regex to replace \n with br and display spaces entered by users.
Then, I do {{text|safe}} to display the text (if I don't mark it as safe, it won't display spaces between paragraphs but br tags).
Finally I use some jquery plugins on the text: Autolinker.js to detect and "urlize" hyperlinks and trunk8 to control its length.
So, because I do {{text|safe}} I am worried about malicious input, is html.strip_tags enough?
The documentation about strip_tags writes:
"Tries to remove anything that looks like an HTML tag from the string, that is anything contained within <>. Absolutely NO guaranty is provided about the resulting string being entirely HTML safe. So NEVER mark safe the result of a strip_tag call without escaping it first, for example with escape()."
The documentation about Python's Bleach:
"The primary goal of Bleach is to sanitize user input that is allowed to contain some HTML as markup and is to be included in the content of a larger page."
Because the user input is not allowed to contain any html, my guess is that Bleach is not needed.. but I am kind of noob so your suggestions will be appreciated.
Quoting the docs on striptags
No safety guarantee
Note that striptags doesn’t give any guarantee about its output being entirely HTML safe, particularly with non valid
HTML input. So NEVER apply the safe filter to a striptags output. If
you are looking for something more robust, you can use the bleach
Python library, notably its clean method.
I think the answer here is to use bleach to strip the tags, easy as bleach.clean(text,tags=[]). Plus, with bleach linkefy you can take care of the url's as well.
Regarding your general process, If the string is generated once and queried multiple times ... why aren't you adding the line break and url's while saving ?
If the only reason you need to mark the input as "safe" is so that it will display your <br> tags that you inserted where users typed line breaks, then your best approach is to use the linebreaks filter. From the Django documentation:
linebreaks
Replaces line breaks in plain text with appropriate HTML; a single newline becomes an HTML line break (<br />) and a new line followed by a blank line becomes a paragraph break (</p>).
For example:
{{ value|linebreaks }}
If value is Joel\nis a slug, the output will be <p>Joel<br />is a slug</p>.
Instead of using a regex to replace newlines with <br>s in your database, just leave the data in there as the user entered it. Then, you can display it in a template with
{{ text|striptags|linebreaks }}
This will first remove (most) HTML tags from your user's input, then add in <br> and <p> tags for newlines. It does not mark the string as safe, though, so any tags left in the user's input will be escaped; only the tags created by linebreaks will have any effect.
(Note that if you don't want <p> tags, you can use the variant filter linebreaksbr).

New Line Character and PyQt

Using QtGui.QMessageBox to display the messages, warnings and errors.
It seems that QMessageBox doesn't want to work with "\n" new line character when used with html tags
message = "<a href = http://www.google.com> GOOGLE</a> This a line number one.\n This a line number two. \n And this is a line number three."
is all being displayed as one long line when displayed within QMessageBox.
Thanks in advance!
The behaviour you are seeing is entirely as expected. It is part of the HTML 4 spec that, other than inside PRE tags, sequences of whitepsace characters should always be collapsed to a single space. To quote the relevant part of the spec:
Note that a sequence of white spaces between words in the source
document may result in an entirely different rendered inter-word
spacing (except in the case of the PRE element). In particular, user
agents should collapse input white space sequences when producing
output inter-word space.
So, when you need to insert line-breaks, do it explicitly using the <br> tag.
PS:
It's also worth noting here that Qt's text widgets only support a limited set of HTML tags, attributes and CSS properties. For full details, see the Supported HTML Subset in the Qt docs.

Categories

Resources