I have an html text with non-ascii characters, and I want it printed using HtmlEasyPrinting module.
printer = HtmlEasyPrinting()
text = '''
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<b><</b>
<p>é</p>
</body>
</html>'''
printer.PreviewText(text)
printer.PrintText(text)
When I run the preview part, everything looks ok! (right encoding applied)
When I try to PrintText instead, things go wrong and it seems that non-ascii characters got funny representations on paper/exported file.
Anyone have an idea why the Preview works ok, but not the Printing itself? Are there some setting to be applied?
wx.version = 2.8.11.0 (gtk2-unicode)
python 2.7
Related
I intend to use Python to generate an HTML code that will be saved to an external file. The HTML code will display the current day. No Javascript nor PHP allowed. How can I make the Python code repeat every day after midnight, so that the corresponding HTML file will contain a valid day in a week info? Thank you. My code:
import datetime
t=datetime.datetime.now()
s="""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Daily Portal</title>
</head>
<body>
<div>{maininfo:}</div>
</body>
</html>"""
s1=t.strftime("%A")
s2="Good morning John. Today is "+s1+"."+" Have a beautiful day."
s=s.format(maininfo=s2)
print(s)
You can set a job scheduler to execute the code everyday. Refer here and here (if you are running on a unix system).
Here is the MWE, test.py - the test webpage that is written inline as mypage, is served from http://sdaaubckp.sourceforge.net/test/test-utf8.html , so you should be able to run this script as-is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os, sys
import re
import lxml.html as LH
import requests
if sys.version_info[0]<3: # python 2
from StringIO import StringIO
else: #python 3
from io import StringIO
# this page uploaded on: http://sdaaubckp.sourceforge.net/test/test-utf8.html
mypage = """
<!doctype html>
<html lang="en">
<head>
<!-- Basic Page Needs
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta charset="utf-8">
<title>My Page</title>
<meta name="description" content="">
<meta name="author" content="">
</head>
<body>
<div>Testing: tøst</div>
</body>
</html>
"""
url_page = "http://sdaaubckp.sourceforge.net/test/test-utf8.html"
confpage = requests.get(url_page)
print(confpage.encoding) # it detects ISO-8859-1, even if the page declares <meta charset="utf-8">?
confpage.encoding = "UTF-8"
print(confpage.encoding) # now it says UTF-8, but...
#print(confpage.content)
if sys.version_info[0]<3: # python 2
mystr = confpage.content
else: #python 3
mystr = confpage.content.decode("utf-8")
for line in iter(mystr.splitlines()):
if 'Testing' in line:
print(line)
confpagetree = LH.fromstring(confpage.content)
print(confpagetree) # <Element html at 0x7f4b7074eec0>
#print(confpagetree.text_content())
for line in iter(confpagetree.text_content().splitlines()):
if 'Testing' in line:
print(line)
I'm running this on Ubuntu 14.04.5 LTS; both Python 2 and 3 give the same results with this script:
$ python2 test.py
ISO-8859-1
UTF-8
<div>Testing: tøst</div>
<Element html at 0x7fb5b9d12ec0>
Testing: tøst
$ python3 test.py
ISO-8859-1
UTF-8
<div>Testing: tøst</div>
<Element html at 0x7f272fc53318>
Testing: tøst
Note how:
In both cases, confpage.encoding detects ISO-8859-1, even if the webpage declares <meta charset="utf-8">
In both cases, correct UTF-8 character ø is printed from confpage.content
In both cases, corrupt UTF-8 representation ø is output from lxml.html.fromstring(confpage.content).text_content()
My suspicion is, since the webpage uses – UTF-8 character (Char: '–' u: 8211 [0x2013] b: 226,128,147 [0xE2,0x80,0x93] n: EN DASH [General Punctuation]) before it declares <meta charset="utf-8"> in the <head>, this somehow borks requests and/or lxml.html.fromstring().text_content(), which results with the corrupt representation.
My question is - what can I do, so I get a correct UTF-8 character at the output of lxml.html.fromstring().text_content() - hopefully for both Python 2 and 3?
The root problem is that you're using confpage.content instead of confpage.text.
requests.Response.content gives you the raw bytes (bytes in 3.x, str in 2.x), as pulled off the wire. It doesn't matter what encoding is, because you aren't using it.
requests.Response.text gives you the decoded Unicode (str in 3.x, unicode in 2.x), based on the encoding.
So, setting the encoding but then using content doesn't do anything. If you just change the rest of your code to use text instead of content (and get rid of the now-spurious decode for Python 3), it will work:
mystr = confpage.text
for line in iter(mystr.splitlines()):
if 'Testing' in line:
print(line)
confpagetree = LH.fromstring(confpage.text)
print(confpagetree) # <Element html at 0x7f4b7074eec0>
#print(confpagetree.text_content())
for line in iter(confpagetree.text_content().splitlines()):
if 'Testing' in line:
print(line)
If you want to go through the exact problem with each of your examples:
Your first example is right in Python 3, but not the best way to do it. By calling decode("utf-8") on the content, since the bytes do happen to be UTF-8, you're decoding them properly. So they will print out properly.
Your first example is wrong in Python 2. You're just printing the content, which is a bunch of UTF-8 bytes. If your console is UTF-8 (as it is on macOS, and might be on Linux), this will happen to work. If your console is something else, like cp1252 or Latin-1 (as it is on Windows, and might be on Linux), this will give you mojibake.
Your second example is also wrong. By passing bytes to LH.fromstring, you're forcing lxml to guess what encoding to use, and it guesses Latin-1, so you get mojibake.
// After the "Content-type..." declaration...
print """<html>\
<head>
<title>Create Survey</title>
<link href="styles.css" type="text/css" rel="stylesheet">
</head>
<body>...."""
Assuming you are using something like CGI to "print" text based on a web request you should rely on your web server (Apache for example) to "print" the content back to the requesting client based on where your CSS file is located in the htdocs directory.
However if you are just wanting some output in a command line window you could do...
print file('/path/to/your/file/styles.css').read()
I've installed the markdown preview plugin for gedit running on Lubuntu 13.04. It works as expected.
However, for ease of viewing, I altered the appearance of the resulting html panel (left panel) by including a link to a local stylesheet at the top of each markdown file. But this approach obviously means that I have to alter all my existing markdown files.
To avoid that, I looked at ~/.local/share/gedit/plugins/markdown-preview/__init__.py which has the code for the plugin, and I see lines #39 and #40 (reproduced below):
# Can be used to add default HTML code (e.g. default header section with CSS).
htmlTemplate = "%s"
That gives me the impression that I can somehow tell the plugin to look at a stylesheet and style the html accordingly. But I don't know what to do (if indeed htmlTemplate = "%s" has to be changed).
Set htmlTemplate to something like the following
# Can be used to add default HTML code (e.g. default header section with CSS).
htmlTemplate = """
<html>
<head>
<link rel="stylesheet" type="text/css" charset="utf-8" media="screen" href="http://paste.ubuntu.com/static/pastebin.css">
</head>
<body>
%s
</body>
</html>
"""
I'm trying to output some of my data to an HTML file.
Python has no problem creating a new file, but it seems to have problem with the write command. The program functions with no errors or warnings, but the filesize remains 0kb (empty).
I'm a bit of a newbie to python, so I'm hoping someone can point out my mistake.
Here is the code:
#OUTPUT
calcfile = open('calculation.html','w');
CALCOUT = """<!DOCTYPE html>
<html>
<head>
<title>Quick Calculation</title>
</head>
<body>
<h1>Estimate</h1>
<table>
"""
#Some code which appends to CALCOUT -- long but it works perfectly via STDOUT.
calcfile.write("%s" % CALCOUT);
#also tried calcfile.write(CALCOUT);
You have to remember to close the file after opening it. Or even better, use the with constuct, which closes files automatically as soon as the scope of the with block is exited.
with open('calculation.html','w') as calcfile:
CALCOUT = """<!DOCTYPE html>
<html>
<head>
<title>Quick Calculation</title>
</head>
<body>
<h1>Estimate</h1>
<table>
"""
calcfile.write(CALCOUT)
Try this:
calcfile.write(str(CALCOUT))
Also, there are no semicolons needed in Python.
You have to calcfile.close() the file of course.