Using Templates Safe Substitute Function in a Loop - python

I have this simple code:
html_string = '''<html lang="en-US">
'<head>
<title>My Python articles</title>
</head>
<body>'''
for i in range(2):
html_string += '''
<p>
<span style="white-space: pre-line">$''' + str(i) + '''</span>
</p>'''
html_string += '''</body>
</html>'''
html_template = Template(html_string)
output_dir = "./html/"
output_path = os.path.join(output_dir, 'my_page.html')
with io.open(output_path, 'w+', encoding='UTF-8', errors='replace') as html_output:
for i in range(2):
html_output.write(html_template.safe_substitute(i="Hallo"))
html_output.truncate()
It looks like the i in the html_output.write(html_template.safe_substitute(i="Hello")) doesn't correspond to the i in the for loop and all I get is:
$0
$1
$0
$1
$0 and $1 need to exist only once and each of them have to be replaced with the word Hello. Later I'll be replacing $0 and $1 each with a different input.

The docs for template strings have this to say about substitution identifiers:
By default, "identifier" is restricted to any case-insensitive ASCII alphanumeric string (including underscores) that starts with an underscore or ASCII letter.
Identifiers like "$0" and "$1" don't satisfy this condition, because they start with an ASCII digit.
Inserting a letter between the "$" and the digit like this ought to work:
html_string = '''<html lang="en-US">
'<head>
<title>My Python articles</title>
</head>
<body>'''
# Make substitution identifiers like "$Ti"
for i in range(2):
html_string += '''
<p>
<span style="white-space: pre-line">$T''' + str(i) + '''</span>
</p>'''
html_string += '''</body>
</html>'''
html_template = Template(html_string)
# Map identifiers to values
mapping = {'T' + str(i): 'Hello' for i in range(2)}
output_dir = "./html/"
output_path = os.path.join(output_dir, 'my_page.html')
with open(output_path, 'w+', encoding='UTF-8', errors='replace') as html_output:
html_output.write(html_template.safe_substitute(mapping))
html_output.truncate()

Related

Substring any kind of HTML String

i need to divide any kind of html code (string) to a list of tokens.
For example:
"<abc/><abc/>" #INPUT
["<abc/>", "<abc/>"] #OUTPUT
or
"<abc comfy><room /></abc> <br /> <abc/> " # INPUT
["<abc comfy><room /></abc>", "<br />", "<abc/>"] # OUTPUT
or
"""<meta charset="utf-8" /><title> test123 </title><meta name="test" content="index,follow" /><meta name="description" content="Description" /><link rel="stylesheet" href="../layout/css/default.css" />""" # INPUT
[
'<meta charset="utf-8" />',
"<title> test123 </title>",
'<meta name="test" content="index,follow" />',
'<meta name="description" content="Description123" />',
'<link rel="stylesheet" href="../xx/css/default.css" />',
] # OUTPUT
What i tried to do :
def split(html: str) -> List[str]:
if html == "":
return []
delimiter = "/>"
split_name = html.split(" ", maxsplit=1)[0]
name = split_name[1:]
delimited_list = [character + delimiter for character in html.split(delimiter) if character]
rest = html.split(" ", maxsplit=1)[1]
char_delim = html.find("</")
### Help
print(delimited_list)
return delimited_list
My output:
['<abc/>', '<abc/>']
['<abc comfy><room />', '</abc> <br />', ' <abc/>', ' />']
['<meta charset="utf-8" />', '<title> test123</title><meta name="test" content="index,follow" />', '<meta name="description" content="Description123" />', '<link rel="stylesheet" href="../xx/css/default.css" />']
So i tried to split at "/>" which is working for the first case. Then i tried several things. Tried to identify the "name", so the first identifier of the html string like "abc".
Do you guys have any idea how to continue?
Thanks!
Greetings
Nick
You will need a stack data structure and iterate over the string, push the position of opening tags onto the stack, and then when you encounter a closing tag, we assume either:
its name matches the name of the tag beginning at the position on the top of the stack
it is a self-closing tag
We also maintain a result list to save the parsed substrings.
For 1), we simply pop the position on the top of the stack, and save the substring sliced from this popped position until to the end of the closing tag to the result list.
For 2), we do not modify the stack, and only save the self-closing tag substring to the result list.
After encountering any tag (opening, closing, self-closing), we walk the iterator (a.k.a. current position pointer) forward by the length of that tag (from < to corresponding >).
If the html string sliced from the iterator onward does not match (from the beginning) any tag, then we simply walk the iterator forward by one (we crawl until we can again match a tag).
Here is my attempt:
import re
def split(html):
if html == "":
return []
openingTagPattern = r"<([a-zA-Z]+)(?:\s[^>]*)*(?<!\/)>"
closingTagPattern = r"<\/([a-zA-Z]+).*?>"
selfClosingTagPattern = r"<([a-zA-Z]+).*?\/>"
result = []
stack = []
i = 0
while i < len(html):
match = re.match(openingTagPattern, html[i:])
if match: # opening tag
stack.append(i) # push position of start of opening tag onto stack
i += len(match[0])
continue
match = re.match(closingTagPattern, html[i:])
if match: # closing tag
i += len(match[0])
result.append(html[stack.pop():i]) # pop position of start of corresponding opening tag from stack
continue
match = re.match(selfClosingTagPattern, html[i:])
if match: # self-closing tag
start = i
i += len(match[0])
result.append(html[start:i])
continue
i+=1 # otherwise crawl until we can match a tag
return result # reached the end of the string
Usage:
delimitedList = split("""<meta charset="utf-8" /><title> test123 </title><meta name="test" content="index,follow" /><meta name="description" content="Description" /><link rel="stylesheet" href="../layout/css/default.css" />""")
for item in delimitedList:
print(item)
Output:
<meta charset="utf-8" />
<title> test123 </title>
<meta name="test" content="index,follow" />
<meta name="description" content="Description" />
<link rel="stylesheet" href="../layout/css/default.css" />
References:
The openingTagPattern is inspired from #Kobi 's answer here: https://stackoverflow.com/a/1732395/12109043

how to break line in list in python email

I am trying to query a list from my Flask database and then send it out as a html email. However, i am unable to break them into different lines.
for example, instead of:
a
b
c
i get abc currently in the email. i've tried adding "\n" in the loop but it doesnt seem to work. does anyone know how i can break it into different rows?
def mail():
sender_email = "xx#gmail.com"
message = MIMEMultipart("alternative")
message["Subject"] = "xx"
message["From"] = sender_email
message["To"] = user_mail
add = '\n'
list = Lines.query.all()
for s in list:
add += str(s.title) + '\r\n'
print(add)
# Write the plain text part
text = "Thank you for submitting a xx! Here are the lines submitted: " + add
# write the HTML part
html = """\
<html>
<head><head style="margin:0;padding:0;">
<table role="presentation" style="width:100%;border-collapse:collapse;border:20;border-spacing:20;background:#cc0000;">
<tr>
<td align="center" style="padding:20;color:#ffffff;">
Your xxxxx was submitted!
</td>
</tr>
</table>
</head>
<p>Thank you for submitting a xx! Here are the lines submitted for your reference:<br><br>
""" + add + """
<br></br>
</p>
</html>
"""
# convert both parts to MIMEText objects and add them to the MIMEMultipart message
part1 = MIMEText(text, "plain")
part2 = MIMEText(html, "html")
message.attach(part1)
message.attach(part2)
...
server.sendmail("xx#gmail.com", user_mail, message.as_string())
return redirect(url_for('complete'))
I believe what you're looking for is this:
list = Lines.query.all()
for s in list:
add += str(s.title) + '<br>'
or (using format vs string concatenation):
list = Lines.query.all()
for s in list:
add += '{}<br>'.format(str(s.title))
or (python 3.6+ f strings):
list = Lines.query.all()
for s in list:
add += f"{s.title}<br>"
\n is not for HTML, but <br> is.
You can use an empty string and keep adding in the loop.
str = ""
for s in list:
str += f"{s.title}\n"

How to extract tags from HTML file and write them to a new file?

My HTML file has the format shown below
<unit id="2" status="FINISHED" type="pe">
<S producer="Alice_EN">CHAPTER I Down the Rabbit-Hole</S>
<MT producer="ALICE_GG">CAPÍTULO I Abaixo do buraco de coelho</MT>
<annotations revisions="1">
<annotation r="1">
<PE producer="A1.ALICE_GG"><html>
<head>
</head>
<body>
CAPÍTULO I Descendo pela toca do coelho
</body>
</html></PE>
I need to extract ALL the content from two tags in the entire HTML file. The content of one of the tags that starts with <unit id ...> is in one line, but the content of the other tag that starts with "<PE producer ..." and ends with '' is spread over different lines. I need to extract the content within these two tags and write the content to a new file one after another. My output should be:
<unit id="2" status="FINISHED" type="pe">
<PE producer="A1.ALICE_GG"><html>
<head>
</head>
<body>
CAPÍTULO I Descendo pela toca do coelho
</body>
</html></PE>
My code does not extract the content from all the tags of the file. Does anyone have a clue of whats is going on and how I can make this code work properly?
import codecs
import re
t=codecs.open('ALICE.per1_replaced.html','r')
t=t.read()
unitid=re.findall('<unit.*?"pe">', t)
PE=re.findall('<PE.*?</PE>', t, re.DOTALL)
for i in unitid:
for j in PE:
a=i + '\n' + j + '\n'
with open('PEtags.txt','w') as fi:
fi.write(a)
You have a problem with the code where you loop through the matches and write them to file.
If your initid and PE match counts are the same, you may adjust the code to
import re
with open('ALICE.per1_replaced.html','r') as t:
contents = t.read()
unitid=re.findall('<unit.*?"pe">', contents)
PE=re.findall('<PE.*?</PE>', contents, re.DOTALL)
with open('PEtags.txt','w') as fi:
for i, p in zip(unitid, PE):
fi.write( "{}\n{}\n".format(i, p) )

Line numbers in Pygments code highlight in xampp on Windows

I have configured xampp on windows to work with python 2.7 and Pygments. My php code is highlighted properly in Pygments on the website. The code has colors, span elements, classes.
That is how it looks:
But I cannot get line numbers.
As I have read tutorials it depends on the linenos value in python script. The value should be either table or inline or 1 or True.
But it does not work for me. I still gives the same final code
<!doctype html>
<html lang="pl">
<head>
<meta charset="UTF-8">
<title>Document</title>
<link rel="stylesheet" href="gh.css">
</head>
<body>
<div class="highlight highlight-php"><pre><code><span class="nv">$name</span> <span class="o">=</span> <span class="s2">"Jaś"</span><span class="p">;</span>
<span class="k">echo</span> <span class="s2">"Zażółć gęślą jaźń, "</span> <span class="o">.</span> <span class="nv">$name</span> <span class="o">.</span> <span class="s1">'.'</span><span class="p">;</span>
<span class="k">echo</span> <span class="s2">"hehehe#jo.io"</span><span class="p">;</span>
</code></pre></div>
</html>
How to add line numbers? I put two files of the website below:
index.py
import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter
# If there isn't only 2 args something weird is going on
expecting = 2;
if ( len(sys.argv) != expecting + 1 ):
exit(128)
# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2]
f = open(filename, 'rb')
code = f.read()
f.close()
# PHP
if language == 'php':
from pygments.lexers import PhpLexer
lexer = PhpLexer(startinline=True)
# GUESS
elif language == 'guess':
from pygments.lexers import guess_lexer
lexer = guess_lexer( code )
# GET BY NAME
else:
from pygments.lexers import get_lexer_by_name
lexer = get_lexer_by_name( language )
# OUTPUT
formatter = HtmlFormatter(linenos='table', encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted
index.php
<?php
define('MB_WPP_BASE', dirname(__FILE__));
function mb_pygments_convert_code($matches)
{
$pygments_build = MB_WPP_BASE . '/index.py';
$source_code = isset($matches[3]) ? $matches[3] : '';
$class_name = isset($matches[2]) ? $matches[2] : '';
// Creates a temporary filename
$temp_file = tempnam(sys_get_temp_dir(), 'MB_Pygments_');
// Populate temporary file
$filehandle = fopen($temp_file, "w");
fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8'));
fclose($filehandle);
// Creates pygments command
$language = $class_name ? $class_name : 'guess';
$command = sprintf('C:\Python27/python %s %s %s', $pygments_build, $language, $temp_file);
// Executes the command
$retVal = -1;
exec($command, $output, $retVal);
unlink($temp_file);
// Returns Source Code
$format = '<div class="highlight highlight-%s"><pre><code>%s</code></pre></div>';
if ($retVal == 0)
$source_code = implode("\n", $output);
$highlighted_code = sprintf($format, $language, $source_code);
return $highlighted_code;
}
// This prevent throwing error
libxml_use_internal_errors(true);
// Get all pre from post content
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding('
<pre class="php">
<code>
$name = "Jaś";
echo "Zażółć gęślą jaźń, " . $name . \'.\';
echo "<address>hehehe#jo.io</address>";
</code>
</pre>', 'HTML-ENTITIES', "UTF-8"), LIBXML_HTML_NODEFDTD);
$pres = $dom->getElementsByTagName('pre');
foreach ($pres as $pre) {
$class = $pre->attributes->getNamedItem('class')->nodeValue;
$code = $pre->nodeValue;
$args = array(
2 => $class, // Element at position [2] is the class
3 => $code // And element at position [2] is the code
);
// convert the code
$new_code = mb_pygments_convert_code($args);
// Replace the actual pre with the new one.
$new_pre = $dom->createDocumentFragment();
$new_pre->appendXML($new_code);
$pre->parentNode->replaceChild($new_pre, $pre);
}
// Save the HTML of the new code.
$newHtml = "";
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $child) {
$newHtml .= $dom->saveHTML($child);
}
?>
<!doctype html>
<html lang="pl">
<head>
<meta charset="UTF-8">
<title>Document</title>
<link rel="stylesheet" href="gh.css">
</head>
<body>
<?= $newHtml ?>
</body>
</html>
Thank you
While reading the file try readlines:
f = open(filename, 'rb')
code = f.readlines()
f.close()
This way you do the following it will get multiple lines :
formatter = HtmlFormatter(linenos='table', encoding='utf-8', nowrap=True)
Suggestion:
More pythonic way of opening files is :
with open(filename, 'rb') as f:
code = f.readlines()
That's it python context manager closes this file for you.
Solved!
nowrap
If set to True, don’t wrap the tokens at all, not even inside a tag. This disables most other options (default: False).
http://pygments.org/docs/formatters/#HtmlFormatter

joining html header, body and trailer using Python

I'm trying to join htm header with the body and trailer like this
message1 = """<html>
<head></head>
<body><p>"""
message2 = 'Hello World!'
message3 = """</p></body>
</html>"""
html_message = join(message1,message2,message3)
but when I print "html_message" the result is "\Hello World!\", why themthe backslashes appear , how could I remove
I suppose that you are using a (wrong) imported method, but you can use the percent operand with strings
template = """
<html>
<head></head>
<body>
<p>%(text)s</p>
</body>
</html>
"""
html_message = template % {"text":"Hello World!"}
Don't forget, the variable name format is: percent sign + name in parentheses + format (s for string)
Here is how I would join the string:
message1 = """<html>
<head></head>
<body><p>"""
message2 = 'Hello World!'
message3 = """</p></body>
</html>"""
html_message = "".join([message1,message2,message3])
The join you are referring to merges files together, which are usually in the form Something/another/thing, hence the back slashes. No need for from os.path import basename, join
To join strings, use the + symbol
html_message = message1 + message2 + message3
os.path.join is used to create file path like os.path.join('my', 'script.py') => my/script.py

Categories

Resources