python/genshi newline to html <p> paragraphs

python/genshi newline to html <p> paragraphs - python

I'm trying to output the content of a comment with genshi, but I can't figure out how to transform the newlines into HTML paragraphs.
Here's a test case of what it should look like:
input: 'foo\n\n\n\n\nbar\nbaz'
output: <p>foo</p><p>bar</p><p>baz</p>
I've looked everywhere for this function. I couldn't find it in genshi or in python's std lib. I'm using TG 1.0.

def tohtml(manylinesstr):
return ''.join("<p>%s</p>" % line
for line in manylinesstr.splitlines()
if line)
So for example,
print repr(tohtml('foo\n\n\n\n\nbar\nbaz'))
emits:
'<p>foo</p><p>bar</p><p>baz</p>'
as required.

There may be a built-in function in Genshi, but if not, this will do it for you:
output = ''.join([("<p>%s</p>" % l) for l in input.split('\n')])

I know you said TG1 my solution is TG2 but can be backported or simply depend on webhelpers but IMO all other implementations are flawed.
Take a look at the converters module both nl2br and format_paragraphs.

Related

Python - Remove 'style'-attribute from HTML

I have a String in Python, which has some HTML in it. Basically it looks like this.
>>> print someString # I get someString from the backend
"<img style='height:50px;' src='somepath'/>"
I try to display this HTML in a PDF. Because my PDF generator can't handle the styles-attribute (and no, I can't take another one), I have to remove it from the string. So basically, it should be like that:
>>> print someString # I get someString from the backend
"<img style='height:50px;' src='somepath'/>"
>>> parsedString = someFunction(someString)
>>> print parsedString
"<img src='somepath'/>"
I guess the best way to do this is with RegEx, but I'm not very keen on it. Can someone help me out?

I wouldn't use RegEx with this because
Regex is not really suited for HTML parsing and even though this is a simple one there could be many variations and edge cases you need to consider and the resulting regex could turn out to be a nightmare
Regex sucks. It can be really useful but honestly, they are the epitome of user unfriendly.
Alright, so how would I go about it. I would use trusty BeautifulSoup! Install with pip by using the following command:
pip install beautifulsoup4
Then you can do the following to remove the style:
from bs4 import BeautifulSoup as Soup
del Soup(someString).find('img')['style']
This first parses your string, then finds the img tag and then deletes its style attribute.
It should also work with arbitrary strings but I can't promise that. Maybe you will come up with an edge case.
Remember, using RegEx to parse an HTML string is not the best of ideas. The internet and Stackoverflow is full of answers why this is not possible.
Edit: Just for kicks you might want to check out this answer. You know stuff is serious when it is said that even Jon Skeet can't do it.

Using RegEx to work with HTML is a very bad idea but if you really want to use it, try this:
/style=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?/ig

How should I format a long url in a python comment and still be PEP8 compliant

In a block comment, I want to reference a URL that is over 80 characters long.
What is the preferred convention for displaying this URL?
I know bit.ly is an option, but the URL itself is descriptive. Shortening it and then having a nested comment describing the shortened URL seems like a crappy solution.

Don't break the url:
# A Foolish Consistency is the Hobgoblin of Little Minds [1]
# [1]: http://www.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds

From PEP8
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
Two good reasons to break a particular rule:
When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
Personally, I would use that advice, and rather leave the full descriptive URL in your comment for people.

You can use the # noqa at the end of the line to stop PEP8/Flake8 from running that check. This is allowed by PEP8 via:
Special cases aren't special enough to break the rules.

I'd say leave it...
PEP20:
Special cases aren't special enough to break the rules.
Although practicality beats purity.
It's more practical to be able to quickly copy/paste an url then to remove linebreaks when pasting into the browser.

If your are using flake8:
"""
long-url: http://stackoverflow.com/questions/10739843/how-should-i-format-a-long-url-in-a-python-comment-and-still-be-pep8-compliant
""" # noqa

Adding '# noqa' works, but if it's in a docstring the '# noqa' shows up in the docs when built with spinx. In which case, you can build a custom autodoc process method. See this answer.
Here's my adapted version of that,
from sphinx.application import Sphinx
import re
def setup(app):
noqa_regex = re.compile('^(.*)\s\s#\snoqa.*$')
def trim_noqa(app, what_, name, obj, options):
for i, line in enumerate(lines):
if noqa_regex.match(line):
new_line = noqa_regex.sub(r'\1', line)
lines[i] = new_line
app.connect('autodoc-process-docstring', trim_noqa)
return app

You use a url shortener like google's so from this:
http://www.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds
you get:
http://goo.gl/93ZLQ

My option would be:
URL = ('http://stackoverflow.com/questions/10739843/'
'how-should-i-format-a-long-url-in-a-python-'
'comment-and-still-be-pep8-compliant')

regular expression - function body extracting

in Python script, for every method definition in some C++ code of the form:
return_value ClassName::MethodName(args)
{MehodBody}
I need to extract three parts: the class name, the method name and the method body for further processing. Finding and extracting the ClassName and MethodName is easy, but is there any simple way to extract the body of the method? With all possible '{' and '}' inside it? Or are regexes unsuitable for such task?

>>> s = """return_value ClassName::MethodName(args)
{MehodBody {} } """
>>> re.findall(r'\b(\w+)::(\w+)\([^{]+\{(.+)}', s, re.S)
[('ClassName', 'MethodName', 'MehodBody {} ')]

I would recommend that you use the parser module rather than regexps since it will handle things like multiple line functions, different indentations and will abort on malformed input so that you can manage things better. "Avoid regexps if you can" is one of the rules I live by since they're often more trouble that they're worth.
Edit:
Oh okay. I misread your question. I thought you wanted to parse Python code itself. I googled a little bit and found this but it's C only. Perhaps you can extend that? The grammar for C++ is there in the "C++ programming language book"

Converting \n to <br> in mako files

I'm using python with pylons
I want to display the saved data from a textarea in a mako file with new lines formatted correctly for display
Is this the best way of doing it?
> ${c.info['about_me'].replace("\n", "<br />") | n}

The problem with your solution is that you bypass the string escaping, which can lead to security issues. Here is my solution :
<%! import markupsafe %>
${text.replace('\n', markupsafe.Markup('<br />'))}
or, if you want to use it more than once :
<%!
import markupsafe
def br(text):
return text.replace('\n', markupsafe.Markup('<br />'))
%>
${text | br }
This solution uses markupsafe, which is used by mako to mark safe strings and know which to escape. We only mark <br/> as being safe, not the rest of the string, so it will be escaped if needed.

It seems to me that is perfectly suitable.
Be aware that replace() returns a copy of the original string and does not modify it in place. So since this replacement is only for display purposes it should work just fine.
Here is a little visual example:
>>> s = """This is my paragraph.
...
... I like paragraphs.
... """
>>> print s.replace('\n', '<br />')
This is my paragraph.<br /><br />I like paragraphs.<br />
>>> print s
This is my paragraph.
I like paragraphs.
The original string remains unchanged. So... Is this the best way of doing it?
Ask yourself: Does it work? Did it get the job done quickly without resorting to horrible hacks? Then yes, it is the best way.

Beware as line breaks in <textarea>s should get submitted as \r\n according to http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4
To be safe, try s.replace('\r\n', '<br />') then s.replace('\n', '<br />').

This seems dangerous to me because it prints the whole string without escaping, which would allow arbitrary tags to be rendered. Make sure you cleanse the user's input with lxml or similar before printing. Beware that lxml will wrap in an HTML tag, it just can't handle things that aren't like that, so get ready to remove that manually or adjust your CSS to accommodate.

try this it will work:-
${c.info['about_me'] | n}

There is also a simply help function that can be called which will format and santize text correctly replacing \n for tags (see http://sluggo.scrapping.cc/python/WebHelpers/modules/html/converters.html).
In helpers.py add the following:
from webhelpers.html.converters import textilize
Then in your mako file simply say
h.textilize( c.info['about_me'], santize=True)
The santize=True just means that it will make sure any other nasty codes are escaped so users can't hack your site, as the default is False. Alternatively I make do a simple wrapper function in helpers so that santize=True is always defaults to True i.e.
from webhelpers.html.converters import textilize as unsafe_textilize
def textilize( value, santize=True):
return unsafe_textilize( value, santize )
This way you can just call h.textilize( c.info['about_me'] ) from your mako file, which if you work with lots of designers stops them from going crazy.

Sensible python source line wrapping for printout

I am working on a latex document that will require typesetting significant amounts of python source code. I'm using pygments (the python module, not the online demo) to encapsulate this python in latex, which works well except in the case of long individual lines - which simply continue off the page. I could manually wrap these lines except that this just doesn't seem that elegant a solution to me, and I prefer spending time puzzling about crazy automated solutions than on repetitive tasks.
What I would like is some way of processing the python source code to wrap the lines to a certain maximum character length, while preserving functionality. I've had a play around with some python and the closest I've come is inserting \\\n in the last whitespace before the maximum line length - but of course, if this ends up in strings and comments, things go wrong. Quite frankly, I'm not sure how to approach this problem.
So, is anyone aware of a module or tool that can process source code so that no lines exceed a certain length - or at least a good way to start to go about coding something like that?

You might want to extend your current approach a bit, but using the tokenize module from the standard library to determine where to put your line breaks. That way you can see the actual tokens (COMMENT, STRING, etc.) of your source code rather than just the whitespace-separated words.
Here is a short example of what tokenize can do:
>>> from cStringIO import StringIO
>>> from tokenize import tokenize
>>>
>>> python_code = '''
... def foo(): # This is a comment
... print 'foo'
... '''
>>>
>>> fp = StringIO(python_code)
>>>
>>> tokenize(fp.readline)
1,0-1,1: NL '\n'
2,0-2,3: NAME 'def'
2,4-2,7: NAME 'foo'
2,7-2,8: OP '('
2,8-2,9: OP ')'
2,9-2,10: OP ':'
2,11-2,30: COMMENT '# This is a comment'
2,30-2,31: NEWLINE '\n'
3,0-3,4: INDENT ' '
3,4-3,9: NAME 'print'
3,10-3,15: STRING "'foo'"
3,15-3,16: NEWLINE '\n'
4,0-4,0: DEDENT ''
4,0-4,0: ENDMARKER ''

I use the listings package in LaTeX to insert source code; it does syntax highlight, linebreaks et al.
Put the following in your preamble:
\usepackage{listings}
%\lstloadlanguages{Python} # Load only these languages
\newcommand{\MyHookSign}{\hbox{\ensuremath\hookleftarrow}}
\lstset{
% Language
language=Python,
% Basic setup
%basicstyle=\footnotesize,
basicstyle=\scriptsize,
keywordstyle=\bfseries,
commentstyle=,
% Looks
frame=single,
% Linebreaks
breaklines,
prebreak={\space\MyHookSign},
% Line numbering
tabsize=4,
stepnumber=5,
numbers=left,
firstnumber=1,
%numberstyle=\scriptsize,
numberstyle=\tiny,
% Above and beyond ASCII!
extendedchars=true
}
The package has hook for inline code, including entire files, showing it as figures, ...

I'd check a reformat tool in an editor like NetBeans.
When you reformat java it properly fixes the lengths of lines both inside and outside of comments, if the same algorithm were applied to Python, it would work.
For Java it allows you to set any wrapping width and a bunch of other parameters. I'd be pretty surprised if that didn't exist either native or as a plugin.
Can't tell for sure just from the description, but it's worth a try:
http://www.netbeans.org/features/python/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python/genshi newline to html <p> paragraphs - python

def tohtml(manylinesstr): return ''.join("<p>%s</p>" % line for line in manylinesstr.splitlines() if line) So for example, print repr(tohtml('foo\n\n\n\n\nbar\nbaz')) emits: '<p>foo</p><p>bar</p><p>baz</p>' as required.

There may be a built-in function in Genshi, but if not, this will do it for you: output = ''.join([("<p>%s</p>" % l) for l in input.split('\n')])

I know you said TG1 my solution is TG2 but can be backported or simply depend on webhelpers but IMO all other implementations are flawed. Take a look at the converters module both nl2br and format_paragraphs.

Related

Python - Remove 'style'-attribute from HTML

How should I format a long url in a python comment and still be PEP8 compliant

regular expression - function body extracting

Converting \n to <br> in mako files

Sensible python source line wrapping for printout

Categories

Resources