Genshi: for loop inserts line breaks - python

Source code: I have the following program.
import genshi
from genshi.template import MarkupTemplate
html = '''
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://genshi.edgewall.org/">
<head>
</head>
<body>
<py:for each="i in range(3)">
<py:choose>
<del py:when="i == 1">
${i}
</del>
<py:otherwise>
${i}
</py:otherwise>
</py:choose>
</py:for>
</body>
</html>
'''
template = MarkupTemplate(html)
stream = template.generate()
html = stream.render('html')
print(html)
Expected output: the numbers are printed consecutively with no whitespace (and most critically no line-break) between them.
<html>
<head>
</head>
<body>
0<del>1</del>2
</body>
</html>
Actual output: It outputs the following:
<html>
<head>
</head>
<body>
0
<del>1</del>
2
</body>
</html>
Question: How do I eliminate the line-breaks? I can deal with the leading whitespace by stripping it from the final HTML, but I don't know how to get rid of the line-breaks. I need the contents of the for loop to be displayed as a single continuous "word" (e.g. 012 instead of 0 \n 1 \n 2).
What I've tried:
Reading the Genshi documentation.
Searching StackOverflow
Searching Google
Using a <?python ...code... ?> code block. This doesn't work since the carets in the <del> tags are escaped and displayed.
<?python
def numbers():
n = ''
for i in range(3):
if i == 1:
n += '<del>{i}</del>'.format(i=i)
else:
n += str(i)
return n
?>
${numbers()}
Produces 0<del>1</del>2
I also tried this, but using genshi.builder.Element('del') instead. The results are the same, and I was able to conclusively determine that the string returned by numbers() is being escaped after the return occurs.
A bunch of other things that I can't recall at the moment.

Not ideal, but I did finally find an acceptable solution. The trick is to put the closing caret for a given tag on the next line right before the next tag's opening caret.
<body>
<py:for each="i in range(3)"
><py:choose
><del py:when="i == 1">${i}</del
><py:otherwise>${i}</py:otherwise
></py:choose
</py:for>
</body>
Source: https://css-tricks.com/fighting-the-space-between-inline-block-elements/
If anyone has a better approach I'd love to hear it.

Related

HTML Output in Pyscript

I was experimenting in Pyscript and I tried to print an HTML table, but it didn't work. It seems to delete the tags and mantain just the plain text.
Why is that? I tried to search online, but being a new technology i didn't find much.
This is my code:
<py-script>
print("<table>")
for i in range (2):
print("<tr>")
for j in range (2):
print("<td>test</td>")
print("</tr>")
print("</table>")
</py-script>
And this is the output I get:
I tried to replace the print() method with the pyscript.write() method, but it didn't work too.
I dig in source code pyscript.py
and at this moment works for me only code similar to JavaScript
For example this adds <h1>Hello</h1>
<div id="output"></div>
<py-script>
element = document.createElement('h1')
element.innerText = "Hello"
document.getElementById("output").append(element)
</py-script>
Full working code
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PyScript Demo</title>
<!--<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />-->
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
<div id="output"></div>
<py-script>
element = document.createElement('h1')
element.innerText = "Hello"
document.getElementById("output").append(element)
</py-script>
</body>
</html>
EDIT:
After digging in source code I found that pyscript.js runs function htmlDecode() which removes all tags from code in <py-script> (and probably it also removes tags when you load code from file) and this makes problem.
See Pyscript issue: [BUG] print() doesn't output HTML tags. · Issue #347 · pyscript/pyscript
Some workaround is to use some replacement - ie. {{ }} instead of < > in code - and later use code to replace it back to < >
print( "{{h1}}Hello{{/h1}}".replace("{{", "<").replace("}}", ">") )
or more universal - using function for this
def HTML(text):
return text.replace("{{", "<").replace("}}", ">")
print( HTML("{{h1}}Hello{{/h1}}") )
pyscript.write(some_id, HTML("{{h1}}Hello{{/h1}}") )
document.getElementById(some_id).innerHTML = HTML("{{h1}}Hello{{/h1}}")
Sometimes problem can be also pyscript.css which redefines some items and ie. <h1> looks like normal text.
One solution is to remove pyscript.css.
Other solution is to use classes from pyscript.css like in examples/index.html
<h1 class="text-4xl font-bold">Hello World</h1>
which means
print( HTML('{{h1 class="text-4xl font-bold"}}Hello{{/h1}}') )

Transition my results from python to html file

I need to transition my results from python to html file. How i can do this. I used format function. I know about split html, hybrid of Python and HTML. Both metod cannot be use. My project is big and i need operate on variables to move results from many function to HTML report file.
The code below is a simplified example of what I need to do
def function(n):
result = n + 5
return result
def main():
n = int(input('n: '))
result = function(n)
report = open('result.html', 'w')
html = """
<!DOCTYPE html>
<html>
<body>
<h1>First Head</h1>
<p>My result: ---> I need my result here <--- </p>
</body>
</html>
"""
report.write(html)
report.close()
if __name__ == '__main__':
main()
So you want the results from that function in the middle? Not that hard um. Really no fan of multiple line strings with """ but here goes...
html = """
<!DOCTYPE html>
<html>
<body>
<h1>First Head</h1>
<p>"""
html += result
html += """</p>
</body>
</html>"""
OR you could make it an easier oneliner... with "{}".format():
html = "<!DOCTYPE html>\n<html>\n<body>\n\n<h1>First Head</h1>\n<p>{}</p>\n\n</body>\n</html>".format(result)

Check an HTML document for proper opening and closing tags

I have been learning about stacks (which I created a class for) in Python lately and I learned that you can use them to check if parentheses are balanced, which is each opening symbol has a corresponding closing symbol and the pairs of parentheses are properly nested.
Now I am trying to use a stack to do the same but with HTML. So for example, my program would take the following document-
<html>
<head>
<title>
Example
</title>
</head>
<body>
<h1>Hello, world</h1>
</body>
</html>
and check it to make sure that it has proper opening and closing tags.
I just don't even have an idea on where to start and I'm very confused. Any help is appreciated.
If you want to approach the problem this way, think about what you did before:
You used a stack to keep track of (, [, or {.
What's the difference between ( and <html>? One of these is just a collection of characters. So change your code - instead of reading a single character and placing that on the stack, read the tag and place that on the stack instead.
You may also want to decide if you want to ensure that you've got some valid(ish) HTML - i.e. what happens if you encounter <<html>?
#Old interesting question:use usual stack for pop, push, and peek
#inserted space for each ">x" and "x<" where x is any char,
import re
text='''<html>
<head>
<title>
Example
</title>
</head>
<body>
<h1>Hello, world</h1>
</body>
</html>'''
def check_html_balance(text):
s=Stack()
x=text.replace('<', ' <'); y=x.replace('>', '> ')
a=[w for w in y.split() if re.search('<\S+>',w)]
b=[w for w in a if "/" not in w]
c=[w for w in a if "/" in w]
for w in text.split():
if w in b:
s.push(w)
elif not s.is_empty() and (w in c) and (w.replace('/','')==s.peek()):
s.pop()
return s.is_empty()
check_html_balance(text)
# A second, shorter solution without replacing "x<" by "x <"...
text='''<html>
<head>
<title>
Example
</title>
</head>
<body>
<h1>Hello, world</h1>
</body>
</html>'''
import re
def check_balance(text):
L=re.split('(<[^>]*>)', text)[1::2]
s=Stack()
for word in L:
if ('/' not in word):
s.push(word)
elif not s.is_empty() and (word.replace('/','')==s.peek()):
s.pop()
return s.is_empty()
check_balance(text)

How to wrap custom <root> element around whole HTML document?

I have a large number of HTML documents that must be converted to XML. Not all may look exactly the same. For example, the sample below ends with an HTML comment tag, not with the HTML tag.
Note this question is related to this one.
Here is my code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<comment>this is an HTML comment</comment>
<comment>this is another HTML comment</comment>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
...
<comment>here is a comment inside the head tag</comment>
</head>
<body>
...
<comment>Comment inside body tag</comment>
<comment>Another comment inside body tag</comment>
<comment>There could be many comments in each file and scattered, not just 1 in the head and three in the body. This is just a sample.</comment>
</body>
</html>
<comment>This comment is the last line of the file</comment>
I wish to wrap the entire document with a custom tag called <root>. So far, the best I can do is wrap <root> around <html>.
root_tag = bs4.Tag(name="root")
soup.html.wrap(root_tag)
How can I position the <root> element such that it wraps the entire document?
A little crude, as this is just wrapping any given file in <root> </root>
See if it works for your use case:
def root_wrap(file):
fin = open(file, 'r+')
fin.write('<root>')
for line in fin:
fin.write(line)
fin.write('</root>')
fin.close()

HTMLParsing in Python

So i have a need to process some HTML in Python, and my requirement is that i need to find a certain tag and replace it with different charecter based on the content of the charecters...
<html>
<Head>
</HEAD>
<body>
<blah>
<_translate attr="french"> I am no one,
and no where <_translate>
<Blah/>
</body>
</html>
Should become
<html>
<Head>
</HEAD>
<body>
<blah>
Je suis personne et je suis nulle part
<Blah/>
</body>
</html>
I would like to leave the original HTML untouched an only replace the tags labeled 'important-tag'. Attributes and the contents of that tag will be important to generate the tags output.
I had though about using extending HTMLParser Object but I am having trouble getting out the orginal HTML when i want it. I think what i most want is to parse the HTML into tokens, with the orginal text in each token so i can output my desired output ... i.e. get somthing like
(tag, "<html>")
(data, "\n ")
(tag, "<head>")
(data, "\n ")
(end-tag,"</HEAD>")
ect...
ect...
Anyone know of a good pythonic way to accomplish this ? Python 2.7 standard libs are prefered, third party would also be useful to consider...
Thanks!
You can use lxml to perform such a task http://lxml.de/tutorial.html and use XPath to navigate easily trough your html:
from lxml.html import fromstring
my_html = "HTML CONTENT"
root = fromstring(my_html)
nodes_to_process = root.xpath("//_translate")
for node in nodes_to_process:
lang = node.attrib["attr"]
translate = AWESOME_TRANSLATE(node.text, lang)
node.parent.text = translate
I'll leave up to you the implementation of the AWESOME_TRANSLATE function ;)

Categories

Resources