Weasyprint and CSS: header, footer, pagebreak and positioning - python

I am building a invoice report template in html and using Weasyprint to generate it as a PDF(and as a docx eventually)
The issue I'm having is in the inability to not only page-break, but to also generate a running header and footer properly without the body contents overlapping and turning my data into zalgo texts.
My report template has this simple format:
+==========================+
+ Header +
+==========================+
+ Body +
+==========================+
+ Footer +
+==========================+
Both the header and footer will more or less be prevalent over the pages. The header includes a page counter while the footer will display a value within a textbox only on the last page.
Both my header and footer are referenced to separate HTML templates for versatility, using the include keyword to include them. As this is a template for an invoice, the header is more similar to a letter head.
The main content will be in the body. If the content is too much, it will break and continue on to the next page.
For all 3 parts, I am using tables for formatting purpose, mainly to keep my data aligned.
Here is a sample of my main HTML body:
<!DOCTYPE html>
<html>
<head>
<style type="text/css" media="all">
#page {
size: A4 portrait; /* can use also 'landscape' for orientation */
margin: 1cm;
#top-left{
content: element(header);
}
#bottom-left{
content: element(footer);
}
}
header, footer, .body_content{
font-size: 12px;
/* color: #000; */
font-family: Arial;
width: 100%;
position: relative;
}
header {
/*position: fixed;*/
/*position: running(header); */
/*display: block; */
}
footer {
position: fixed;
/*position: running(footer);*/
/*position: absolute;*/
bottom: 0;
/*display: block;*/
}
.body_content {
position: relative;
page-break-inside: auto;
height: 320pt;
/*overflow: hidden;*/
}
</style>
</head>
<body>
<header>
{% include 'sampleTemplate_header.html' %}
</header>
<div >
<table class="body_content">
<tbody >
<tr style="padding-top:5px;" >
<td style="width:60%;" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" >
</td>
<td style="width:15%;" >
</td>
</tr>
<tr>
</tr>
<tr >
<td style="width:60%;" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" align="right" >
</td>
<td style="width:15%;" align="center" >
</td>
</tr>
<tr >
<td style="width:60%;" id="testCell" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" >
</td>
<td style="width:15%;" >
</td>
</tr>
</tbody>
</table>
</div>
<footer>
{% include 'sampleTemplate_footer.html' %}
</footer>
</body>
</html>
The CSS portion has a lot of commented code due to my experimenting on the layout, but as much as I change, I cant seem to get the layout I need.
One of my most prevalent issue has been the overlapping text of the body content with the header or the footer. The later even happens, despite a forced page-break-after.

I got it working with the running elements (I was reading about it here: https://www.w3.org/TR/css-gcpm-3/#running-elements).
If you put the footer before the main content, then it will show on every page. Running elements apparently moves an element from the main flow into the margin, so I guess if it isn't in the page yet it can't move it into the margin.
To get the header/footer to stop overlapping with the contents, I had to play around with the margin value for the #page. Since running elements moves the element into the margin, making the margin bigger gives it more space. In the example below, if you decrease the value for top (or bottom) margin, the header (or footer) will overlap.
Sometimes I have to set the header/footer height value to get it to position in the margin properly, but I didn't have to for this example.
<!DOCTYPE html>
<html>
<head>
<style type="text/css" media="all">
#page {
size: A4 portrait; /* can use also 'landscape' for orientation */
margin: 100px 1cm 150px 1cm;
#top-left{
content: element(header);
}
#bottom-left{
content: element(footer);
}
}
header {
position: running(header);
/*height: 100px;*/
}
footer {
position: running(footer);
/*height: 150px;*/
}
</style>
</head>
<body>
<header>
multiline<br>header<br>lots<br>of<br>lines<br>here<br>
</header>
<footer>
multiline<br>footer<br>lots<br>of<br>lines<br>here
</footer>
<div >
stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff
</div>
</body>
</html>

Related

PDFkit not preserving style dimensions in inches

I'm using Jinja2 to generate HTML for flash cards (4 to a page). I have CSS styles that set width and height for table cells in inches so that they're sized the way I want. When I open the HTML file in Firefox or Chrome and print to PDF, the cards fill the whole page as expected.
However, I'd like to be able to generate a PDF for users, so I'm using pdfkit to generate a PDF from the HTML. When I generate a PDF using pdfkit, the flash cards are significantly smaller - around 2.8 inches wide instead of 3.75 inches. It seems like pdfkit is scaling the elements differently for some reason.
Below is a minimal reproduction of the problem in two files. Opening test.html in Chrome or Firefox and printing to 8.5x11 paper (or saving to PDF with those settings) should show the cards filling the page as expected. Generating a pdf using test.py should generate a pdf with much smaller cards.
test.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<style>
#page {
margin: 0.4in 0.4in 0.4in 0.4in;
}
.card {
width: 3.75in;
height: 4.95in;
font-size: 10pt;
}
table.card-table, td.card{
border: 1px;
border-style: dotted;
border-color: #AAAAAA;
vertical-align: top;
}
table.card-table {
page-break-after: always;
border-collapse: collapse;
}
</style>
</head>
<body>
<table class="card-table">
<tbody>
<tr>
<td class="card">Test</td>
<td class="card">Test</td>
</tr>
<tr>
<td class="card">Test</td>
<td class="card">Test</td>
</tr>
</tbody>
</table>
<table class="card-table">
<tbody>
<tr>
<td class="card">Test</td>
<td class="card">Test</td>
</tr>
<tr>
<td class="card">Test</td>
<td class="card">Test</td>
</tr>
</tbody>
</table>
</body>
</html>
test.py:
import pdfkit
pdfkit.from_file('test.html', 'out.pdf', options={
'dpi': 300,
'page-size': 'Letter',
'margin-top': '0.4in',
'margin-right': '0.4in',
'margin-bottom': '0.4in',
'margin-left': '0.4in',
})

How to use border-radius while converting html to pdf using xhtmltopdf

I am trying to round the corners of my table, border-radius doen't seem to work when I convert the below HTML to PDF using xhtmltopdf pdf generator. Below is the HTML written for content file name is sticker_print.html :
<div class="sticker" style="height:196px">
<table class="sticker_box" align="left">
<tr>
<td style="border: 1px solid #222;background-color: #ffffff;">
<h3 style="border-bottom: 1px solid #222222;">Batch Sticker</h3>
<h5 style="padding: 0 0 0 10px;">Batch ID</h5>
<p>MFG Date</p>
<p style="padding-bottom:0px;"><img src="http://www.computalabel.com/Images/C128ff#2x.png" width="195px" height="26px"><span> Bar Code </span></p>
<p style="text-align: left; padding-bottom: 0px;">
<img src="https://www.kaspersky.com/content/en-global/images/repository/isc/2020/9910/a-guide-to-qr-codes-and-how-to-scan-qr-codes-2.png" width="65px" height="65px">
<span style="display: block;margin-top: 0px;">QR Code</span>
</p>
</td>
</tr>
</table>
</div>
PDF CODE
pdf = render_to_pdf('sticker_print.html')
return HttpResponse(pdf, content_type='application/pdf')
Even though I'm not using the same PDF engine as you (and your question is 6 months old), I solved this issue by using corner-radius instead of border-radius on a table cell or div.

Formatting flask-table output with html on python

Not a very experienced user with python here, just picked up python while staying at home. I was looking into integrating the output of flask-table to a website. The following is my python code that creates a simple table for my website.
from flask_table import Table, Col
app = Flask(__name__)
class ItemTable(Table):
name = Col('Name')
description = Col('Description')
Class Item(object):
def __init__(self, name, description):
self.name = name
self.description = description
items = [dict(name='Name1', description='Description1'),
dict(name='Name2', description='Description2'),
dict(name='Name3', description='Description3')]
table = ItemTable(items)
#app.route("/")
def hello():
return render_template('index.html', tStrToLoad=table.__html__())
if __name__ == "__main__":
app.run()
and my html code that takes tStrToLoad from the python code above to display.
<html>
<head>
<title>Test Flask Table</title>
<style>
body
{
background-color: #000000;
color: #FFFFFF;
font-family:Verdana;
font-size:16px;
}
table, th, td
{
border: 1px solid #0088ff;
border-collapse: collapse;
padding: 3px;
font-family:Verdana;
font-size:12px;
text-align: left;
}
</style>
</head>
<body>
a simple test table
<br/><br/>
{{ tStrToLoad }}
</body>
</html>
And instead of showing a table with the data, I have the following output in a black background
a simple test table
<table> <thead><tr><th>Name</th><th>Description</th></tr></thead> <tbody> <tr><td>Name1</td><td>Description1</td></tr> <tr><td>Name2</td><td>Description2</td></tr> <tr><td>Name3</td><td>Description3</td></tr> </tbody> </table>
Upon further investigating, I did a view page source, this is my actual html code that arises from this.
<html>
<head>
<title>Test Flask Table</title>
<style>
body
{
background-color: #000000;
color: #FFFFFF;
font-family:Verdana;
font-size:16px;
}
table, th, td
{
border: 1px solid #0088ff;
border-collapse: collapse;
padding: 3px;
font-family:Verdana;
font-size:12px;
text-align: left;
}
</style>
</head>
<body>
a simple test table
<br/><br/>
<table>
<thead><tr><th>Name</th><th>Description</th></tr></thead>
<tbody>
<tr><td>Name1</td><td>Description1</td></tr>
<tr><td>Name2</td><td>Description2</td></tr>
<tr><td>Name3</td><td>Description3</td></tr>
</tbody>
</table>
</body>
</html>
My question is how do I format it correctly in my python script such that it sends < and > instead of & lt; and & gt;
If you're confident none of the data in the table has un-validated user input then tell Jinja not to escape the html:
{{ tStrToLoad | safe }}
However you may wish to avoid using Flask-tables, and just pass the items list yourself. The template code can become more generic if you also pass some headers in a separate list:
headers = ['Name','Description']
return render_template('index.html',
headers = headers,
objects = items)
Then manually build the table:
{% if objects %}
<table id='result' class='display'>
<thead>
<tr>
{% for header in headers %}
<th>{{header}}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for object in objects %}
<tr>
{% for k, val in object.items() %}
<td>{{val}}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
{% endif %}
As you can see, this prevents you from having to hard-code any values into the template.
The syntax of this table is also compatible with datatables, so you if you load the CSS and JS files as per the zero configuration example then your table becomes prettier and can have a builtin search bar and pagination. Let me know if you need further direction on how to do this.

extract log file data and input directly into xhtml body

I've currently got a python script where a log file is put through and any defined 'excluded' keywords are stripped in the same file. I am attempting to then, after extracting the required words, input this into a pre-built XHTML file directly into the "body" section.
Is there a way that this can be accomplished?
My code for the writing from the extracted log file to the XHTML file is as follows, but this overwrites the XHTML file currently (which I expect as this is where I am stuck).
I have read up on BeautifulSoup but I don't want to go down that path, I want to strictly keep this all executed within the python file (if possible).
contents = open('\path\to\file.log','r')
with open("output.html", "w") as writehtml:
for lines in contents.readlines():
writehtml.write("<pre>" + lines + "</pre> <br>\n")
The formatting I have for my XHTML page within the section is as follows:
<body>
<tr>
<td bgcolor="#ffffff" style="padding: 40px 30px 40px 30px;">
<table border="1" cellpadding="0" cellspacing="0" width="100%%">
<tr>
<td style="padding: 10px 0 10px 0; font-family: Calibri, sans-serif; font-size: 16px;">
<!-- Body text from file goes here-->
Body Text Replaces Here
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</body>
Thanks.
How is this?
# You can read the template data and spell it in
contents = open('\path\to\file.log','r')
# Suppose that the beginning of your template is stored in this file,\path\template\start.txt
start = '''
<body>
<tr>
<td bgcolor="#ffffff" style="padding: 40px 30px 40px 30px;">
<table border="1" cellpadding="0" cellspacing="0" width="100%%">
<tr>
<td style="padding: 10px 0 10px 0; font-family: Calibri, sans-serif; font-size: 16px;">
'''
# start = open('\path\template\start.txt','r')
# Assume that the end of your template is in this file,\path\template\end.txt
end = '''
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</body>
'''
# end = open('\path\template\end.txt','r')
with open("output.html", "a") as writehtml:
writehtml.write(start)
for lines in contents.readlines():
writehtml.write("<pre>" + lines + "</pre> <br>\n")
writehtml.write(end)

Locating an element using Selenium with Python

Edited based on the answers:
I am using Selenium with Python and trying to locate a button on an web application on Chrome. The block of code has an iframe as mentioned in the answer.
<iframe data-bind="attr: { src: src, foo: $root.registerTargetDisplayFrame($data, $element) }, event: {load: function() {loaded(true);}, focus: $root.blurredNavigationPane}" src="https://products.com/InfoShareAuthor/home">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html>
<head>code here
<frameset id="IshTop" class="infoshareauthor" framespacing="0" border="0" bordercolor="#FFFFFF" frameborder="0" rows="31,25,*,0">
<frame id="MenuBar" scrolling="no" name="MenuBar" src="./MainMenuBar.asp">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<body>
<div id="Top-Menu-Container">
<div id="top-menu-wrapper">
<div id="top-menu">
<form name="MainBar">
<script type="text/javascript" language="javascript">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<div style="POSITION: relative;">
<div height="30" style="POSITION: absolute; z-index:0; top: 4px; margin-left: -5px">
<a href="javascript:TabSelect(1);">
<img border="0" src="./UIFramework/tab_active.png">
</a>
</div>
<div onclick="javascript:TabSelect(1);" style="POSITION: absolute; z-index:2; top: -8px">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td id="MenuButton1" class="tab_active" width="95" valign="bottom" height="30" align="center" style="cursor:pointer;padding-bottom:2px;" name="Repository">Repository</td>
</tr>
</tbody>
</table>
</div>
</div>
</td>
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<td width="95" valign="bottom">
</tr>
</tbody>
</table>
</form>
</div>
<div id="top-help">
<div id="top-nav-links">
</div>
</div>
</body>
</html>
</frame>
<frame id="BreadCrumbs" frameborder="0" border="0" scrolling="no" name="BreadCrumbs" src="./BreadCrumbs.asp">
<frameset id="Application" bordercolor="#0099CC" frameborder="0" rows="0,*,0,0,0,0">
<frameset id="HiddenFrameSet" bordercolor="#0099CC" frameborder="0" rows="0,0,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1">
<noframes> It looks like your browser doesn't support frames. This page requires frames in order to function. <br><br>For more information, please <a href='http://www.trisoftcms.com/en/contact-us.html' target=_blank style='white-space:nowrap'>contact us</a>. </noframes>
</frameset>
</html>
</iframe>
I switched frames using this:
iframe = browser.find_element_by_xpath("//iframe[#src='https://products.com/InfoShareAuthor/home']")
browser.switch_to.frame(iframe)
The code that I wrote:
browser.find_element_by_xpath("//td[#id='MenuButton1'][#name='Repository'][contains(text(),'Repository')]")
I could find the element using this xpath when I did a Firebug search
I also tried:
browser.find_element_by_id("MenuButton1")
and
browser.find_element_by_name("Repository")
Note: When I click the button, the URL does not change. Just a list of items in the application expands. Also, IDs and the Names are unique for the seven five menu buttons. None of the menu buttons work.
Does any one have any idea about what might be wrong? I am very new to Python and Selenium.
This doesn't exactly answer your question, but it does address what you're trying to do: it is likely you can accomplish the same task (and many others) with SDL's API client ISHRemote.
https://github.com/sdl/ISHRemote
For example, if you're looking for all the directories under '\General':
Import-Module ISHRemote
# first authenticate
$session = New-IshSession -IshPassword $password -IshUserName $username -WsBaseUrl 'https://ccms.example.com/InfoShareWS/'
# get a list of all the child folders under General
Get-IshFolder -IshSession $session -FolderPath '\General' -Recurse -Depth 2
Or if you're trying to get a list of files in a particular directory:
Import-Module ISHRemote
# first authenticate
$session = New-IshSession -IshPassword $password -IshUserName $username -WsBaseUrl 'https://ccms.example.com/InfoShareWS/'
# get all content in this folder
Get-IshFolderContent -IshSession $session -FolderPath 'General\path\to\topics'
With ISHRemote, you can also find and update publications, move content, modify metadata, etc.
Hope that helps.
you can try load iframe url. It avoids issues with selenium waiting from the iframe to load

Categories

Resources