Process whitespaces from user input - python

The only reason why I include Python in the question is that PHP has the nl2br function that inserts br tags, a similar function in Python could be useful, but I suspect that this problem can be solved with HTML and CSS.
So, I have a form that receives user`s input in a textarea. I save it to the database, which is Postgres and then when I display it, it doesn't include the line breaks the user supplied to separate paragraphs.
I tried using the white-space CSS property on the paragraph tag:
white-space: pre
or
white-space: pre-wrap
But, this is weird, the result was separated lines but the first line aligned in the middle:
including text-align:left didn't solve the problem. I'm sure there is a simple solution to this.

I would suggest to replace newline characters with <br /> either before storing it in the database (only once) or when fetching it (see comment).
With Python:
import re
myUserInput = re.sub('(?:\r\n|\r|\n)', '<br />', myUserInput)
With JavaScript (see jsfiddle):
myUserInput = myUserInput.replace(/(?:\r\n|\r|\n)/g, '<br />');

Related

Add HTML linebreaks with Python regex

I need to add HTML linebreaks (<br />) to a string at all line endings which are not followed by a blank line. This simple pattern works:
body = re.sub(r'(.)\n(.)', r'\1<br />\2', body)
But I realized it will not work for an edge case where a line contains only a single character (because the character would have to be part of two different overlapping matches). So I tried the following pattern with lookaround subpatterns:
body = re.sub(r'(?<=.)\n(?=.)', r'<br />', body)
This works as intended, except that the HTML tag is added after the linebreak (\n), and with an additional linebreak:
linebreak
<br/>
!
<br/>
linebreak
<br/>
l
<br/>
works
I would expect that the matched linebreak is substituted by the HTML tag (thereby effectively removing the linebreaks from all matching areas) – why does the tag appear on a new line instead (i.e. increasing the number of linebreaks/lines)?
The equivalent pattern in vim does remove the linebreaks:
s:\(.\)\zs\n\ze\(.\):\<br \/\>:ge
This is quite embarrassing – the pattern/my script do indeed work as supposed to. I was fooled by an HTML source viewer which obviously adds linebreaks to the source code it should display unaltered. Sorry for taking your valuable time.

How to insert indents into html code tag?

I'm trying to put some python code in html document. I am using code tag. Example
<code>
for iris, species in zip(irises, classification):
if species == 0:
print(f'Flower {iris} is Iris-setosa')
</code>
The problem is, page doesn't see new lines and indents. I can handle new lines with br tag but I didn't find anything to make indent. I tried pre tag, but I have to remove all indents in html document, and with several indents in it, it starts to look very ugly. Propably I could use but using 4,8 or 12 in one line doesn't seem to be good idea. Is there anything else I can do to format my code?
The parser will ignore white space characters in the source code. you can may <pre> or <br/> or fake it with CSS. but the solution you proposed is also valid and works, but as you stated it is ugly. if you are going for that you can use &Tab; char and it will create a tab indent; it makes more sense to use it instead of 4 x but you still need to put it inside <pre> tag to avoid being ignored by the parser.

Detecting newline character on user's input (web2py)

I have the following table:
db.define_table('comm',
Field('post','reference post', readable=False, writable=False),
Field('body','text', requires=IS_NOT_EMPTY()),
auth.signature
)
and in a python function, the following code:
form=SQLFORM(db.comm).process()
I call that form in the returned view by the python function
{{=form}}
The problem is when the user inputs two or more paragraphs, it doesn't detect the newline character. How can I fix that?
Use pre tag to display the content in which you want to detect newline character.
<pre>
The HTML pre element (or HTML Preformatted Text) represents
preformatted text. Text within this element is typically displayed in
a non-proportional ("monospace") font exactly as it is laid out in the
file. Whitespace inside this element is displayed as typed.
{{for post in comments:}}
<pre>{{=post.body}}</pre>
{{pass}}
Assuming you are referring to the subsequent display of the user input in a view, you could use a <pre> tag: http://www.w3schools.com/tags/tag_pre.asp. However, you might need some CSS to get the font/styling you like (by default, the browser will use alternative styling with a fixed-width font).
You could also replace the newlines with <br> tags:
{{=XML(record.body.replace('\n', '<br>'), sanitize=True, permitted_tags=['br/'])}}
Because the text now contains <br> HTML tags, it is necessary to wrap it in XML() to prevent web2py from escaping the HTML -- but you also want to sanitize the text and allow only <br> tags to prevent malicious code from being executed.

Django security. dealing with user input . Is html.strip_tags enough or should I use bleach?

I'm accepting user input on a small forum I have. This is what I do with user's input:
First, call "html.strip_tags" from django.utils.html on user's cleaned_data[input].
Save it to the database. Postgre.
Query the text and use a regex to replace \n with br and display spaces entered by users.
Then, I do {{text|safe}} to display the text (if I don't mark it as safe, it won't display spaces between paragraphs but br tags).
Finally I use some jquery plugins on the text: Autolinker.js to detect and "urlize" hyperlinks and trunk8 to control its length.
So, because I do {{text|safe}} I am worried about malicious input, is html.strip_tags enough?
The documentation about strip_tags writes:
"Tries to remove anything that looks like an HTML tag from the string, that is anything contained within <>. Absolutely NO guaranty is provided about the resulting string being entirely HTML safe. So NEVER mark safe the result of a strip_tag call without escaping it first, for example with escape()."
The documentation about Python's Bleach:
"The primary goal of Bleach is to sanitize user input that is allowed to contain some HTML as markup and is to be included in the content of a larger page."
Because the user input is not allowed to contain any html, my guess is that Bleach is not needed.. but I am kind of noob so your suggestions will be appreciated.
Quoting the docs on striptags
No safety guarantee
Note that striptags doesn’t give any guarantee about its output being entirely HTML safe, particularly with non valid
HTML input. So NEVER apply the safe filter to a striptags output. If
you are looking for something more robust, you can use the bleach
Python library, notably its clean method.
I think the answer here is to use bleach to strip the tags, easy as bleach.clean(text,tags=[]). Plus, with bleach linkefy you can take care of the url's as well.
Regarding your general process, If the string is generated once and queried multiple times ... why aren't you adding the line break and url's while saving ?
If the only reason you need to mark the input as "safe" is so that it will display your <br> tags that you inserted where users typed line breaks, then your best approach is to use the linebreaks filter. From the Django documentation:
linebreaks
Replaces line breaks in plain text with appropriate HTML; a single newline becomes an HTML line break (<br />) and a new line followed by a blank line becomes a paragraph break (</p>).
For example:
{{ value|linebreaks }}
If value is Joel\nis a slug, the output will be <p>Joel<br />is a slug</p>.
Instead of using a regex to replace newlines with <br>s in your database, just leave the data in there as the user entered it. Then, you can display it in a template with
{{ text|striptags|linebreaks }}
This will first remove (most) HTML tags from your user's input, then add in <br> and <p> tags for newlines. It does not mark the string as safe, though, so any tags left in the user's input will be escaped; only the tags created by linebreaks will have any effect.
(Note that if you don't want <p> tags, you can use the variant filter linebreaksbr).

Python Markdown nl2br extension

I'm a beginner programmer, and i've been trying to use the python markdown library in my web app. everything works fine, except the nl2br extension.
When I tried to convert text file to html using md.convert(text), it doesn't see to convert newlines to <br>.
for example, before I convert, the text is:
Puerto Rico
===========
------------------------------
### Game Rules
hello world!
after I convert, I get:
<h1>Puerto Rico</h1>
<hr />
<h3>Game Rules</h3>
<p>hello world!</p>
My understanding is that the blank spaces are represented by '\n' and should be converted to <br>, but I'm not getting that result. Here's my code:
import markdown
md = markdown.Markdown(safe_mode='escape',extensions=['nl2br'])
html = md.convert(text)
Please let me know if you have any idea or can point me in the right direction. Thank you.
Try adding two or more white spaces at the end of a line to insert <br/> tags
Example:
hello
world
results in
<p>hello <br>
world</p>
Notice that there are two spaces after the word hello. This only works if you have some text before the two spaces at the end of a line. But this has nothing to do with your nl2br extension, this is markdown standard.
My advice is, if you don't explicitly have to do this conversion, just don't do it. Using paragraphs alias <p>-tags is the cleaner way to seperate text regions.
If you simply want to have more space after your <h3> headlines then define some css for it:
h3 { margin-bottom: 4em; }
Image if you do spacing with <br>-tags after your headlines in all your 500 wiki pages and later you decide that it's 20px too much space. Then you have to edit all your pages by hand and remove two <br>-tags on every page. Otherwise you just edit one line in a css file.
Found this question looking for a clarification myself. Hence adding an update despite being 7 years late.
Reference thread on the Python Markdown Project: https://github.com/Python-Markdown/markdown/issues/707
Turns out that this is indeed the expected behaviour, and hence, the nl2br extension converts only single newlines occurring within a block, not around it. Which means,
This is
a block
This is a different
block
gets converted to
<p>This is<br/>block</p>\n<p>This is a different<br/>block</p>
but when you have distinct, separate blocks,
Puerto Rico
===========
------------------------------
### Game Rules
hello world!
all surrounding newlines are collapsed, and no <br/> tags are injected.
<h1>Puerto Rico</h1>
<hr />
<h3>Game Rules</h3>
<p>hello world!</p>

Categories

Resources