extract log file data and input directly into xhtml body - python

I've currently got a python script where a log file is put through and any defined 'excluded' keywords are stripped in the same file. I am attempting to then, after extracting the required words, input this into a pre-built XHTML file directly into the "body" section.
Is there a way that this can be accomplished?
My code for the writing from the extracted log file to the XHTML file is as follows, but this overwrites the XHTML file currently (which I expect as this is where I am stuck).
I have read up on BeautifulSoup but I don't want to go down that path, I want to strictly keep this all executed within the python file (if possible).
contents = open('\path\to\file.log','r')
with open("output.html", "w") as writehtml:
for lines in contents.readlines():
writehtml.write("<pre>" + lines + "</pre> <br>\n")
The formatting I have for my XHTML page within the section is as follows:
<body>
<tr>
<td bgcolor="#ffffff" style="padding: 40px 30px 40px 30px;">
<table border="1" cellpadding="0" cellspacing="0" width="100%%">
<tr>
<td style="padding: 10px 0 10px 0; font-family: Calibri, sans-serif; font-size: 16px;">
<!-- Body text from file goes here-->
Body Text Replaces Here
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</body>
Thanks.

How is this?
# You can read the template data and spell it in
contents = open('\path\to\file.log','r')
# Suppose that the beginning of your template is stored in this file,\path\template\start.txt
start = '''
<body>
<tr>
<td bgcolor="#ffffff" style="padding: 40px 30px 40px 30px;">
<table border="1" cellpadding="0" cellspacing="0" width="100%%">
<tr>
<td style="padding: 10px 0 10px 0; font-family: Calibri, sans-serif; font-size: 16px;">
'''
# start = open('\path\template\start.txt','r')
# Assume that the end of your template is in this file,\path\template\end.txt
end = '''
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</body>
'''
# end = open('\path\template\end.txt','r')
with open("output.html", "a") as writehtml:
writehtml.write(start)
for lines in contents.readlines():
writehtml.write("<pre>" + lines + "</pre> <br>\n")
writehtml.write(end)

Related

QTooltip with "structured" html

I have a PyQt6 application that features a custom text editor.
When user hovers some word in this editor, a custom QToolTip is displayed.
I would have liked to make it fancier than the default one, with the following structure:
******* TITLE
* *
* IMG * - some text
* * - some other text
*******
I'm really noobish when it comes to HTML. I tried some code using <div> and <p> blocks, it delivered what I wanted when loading it in a navigator, but the result was not as expected in the application.
From what it seems, despite the documentation stating that Qt supports HTML blocks, what I want to achieve might be impossible.
Do you guys have any clue on what I could do to make it work? Above is an example of what I tried.
<div style="background-color: #2F3135;font-family: Franklin Gothic;font-size: 12;">
<div style="float: left;background-color: #2F3135;padding: 30px 20px 30px 30px;"><img src=MY_IMAGE width="64" height="64"/>
</div>
<p style="color: #FFFFFF;line-height:135%"><b><span style="background-color: #009900">TITLE:</b></span><br>
<span style="background-color: #009900;">some text<br></span>
<span style="background-color: #009900;">some other text<br></span>
</p>
</div>
EDIT
Following the answer from musicamante, I tried to do a table rather than using div blocks.
As I say in my answer to him, it works except if the title word of the second row is too long. Above should be a reproducable QTooltip example:
<table style="background-color: #454850;">
<tr>
<th rowspan=7 style="vertical-align: middle;padding-left: 20px;padding-right: 15px"><img src=MYIMAGE width="64" height="64"/></th>
<th><b><span style="color: #CECED7;font-family: Verdana, sans-serif;font-size: 10;">MAIN_TITLE</span></b></th>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td><b><span style="background-color: #307D30;color: #BACABA;font-family: Verdana, sans-serif;font-size: 10;">Inputs:</b></span></td>
</tr>
<tr>
<td><span style="background-color: #913131;color: #D0BFBF;font-family: Verdana, sans-serif;font-size: 10;">blabla</span></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td><b><span style="background-color: #913131;color: #D0BFBF;font-family: Verdana, sans-serif;font-size: 10;">Outputs:</b></td>
</tr>
<tr>
<td><span style="background-color: #913131;color: #D0BFBF;font-family: Verdana, sans-serif;font-size: 10;">blabla</span></td>
</tr>
</table>
So if MAIN_TITLE > 32 characters in my case, the 2nd column will not display properly in the QToolTip.
Maybe I messed up in the HTML code (I'm very new to HTML, never really worked with it).
Any tips is welcomed!

How to use border-radius while converting html to pdf using xhtmltopdf

I am trying to round the corners of my table, border-radius doen't seem to work when I convert the below HTML to PDF using xhtmltopdf pdf generator. Below is the HTML written for content file name is sticker_print.html :
<div class="sticker" style="height:196px">
<table class="sticker_box" align="left">
<tr>
<td style="border: 1px solid #222;background-color: #ffffff;">
<h3 style="border-bottom: 1px solid #222222;">Batch Sticker</h3>
<h5 style="padding: 0 0 0 10px;">Batch ID</h5>
<p>MFG Date</p>
<p style="padding-bottom:0px;"><img src="http://www.computalabel.com/Images/C128ff#2x.png" width="195px" height="26px"><span> Bar Code </span></p>
<p style="text-align: left; padding-bottom: 0px;">
<img src="https://www.kaspersky.com/content/en-global/images/repository/isc/2020/9910/a-guide-to-qr-codes-and-how-to-scan-qr-codes-2.png" width="65px" height="65px">
<span style="display: block;margin-top: 0px;">QR Code</span>
</p>
</td>
</tr>
</table>
</div>
PDF CODE
pdf = render_to_pdf('sticker_print.html')
return HttpResponse(pdf, content_type='application/pdf')
Even though I'm not using the same PDF engine as you (and your question is 6 months old), I solved this issue by using corner-radius instead of border-radius on a table cell or div.

Weasyprint and CSS: header, footer, pagebreak and positioning

I am building a invoice report template in html and using Weasyprint to generate it as a PDF(and as a docx eventually)
The issue I'm having is in the inability to not only page-break, but to also generate a running header and footer properly without the body contents overlapping and turning my data into zalgo texts.
My report template has this simple format:
+==========================+
+ Header +
+==========================+
+ Body +
+==========================+
+ Footer +
+==========================+
Both the header and footer will more or less be prevalent over the pages. The header includes a page counter while the footer will display a value within a textbox only on the last page.
Both my header and footer are referenced to separate HTML templates for versatility, using the include keyword to include them. As this is a template for an invoice, the header is more similar to a letter head.
The main content will be in the body. If the content is too much, it will break and continue on to the next page.
For all 3 parts, I am using tables for formatting purpose, mainly to keep my data aligned.
Here is a sample of my main HTML body:
<!DOCTYPE html>
<html>
<head>
<style type="text/css" media="all">
#page {
size: A4 portrait; /* can use also 'landscape' for orientation */
margin: 1cm;
#top-left{
content: element(header);
}
#bottom-left{
content: element(footer);
}
}
header, footer, .body_content{
font-size: 12px;
/* color: #000; */
font-family: Arial;
width: 100%;
position: relative;
}
header {
/*position: fixed;*/
/*position: running(header); */
/*display: block; */
}
footer {
position: fixed;
/*position: running(footer);*/
/*position: absolute;*/
bottom: 0;
/*display: block;*/
}
.body_content {
position: relative;
page-break-inside: auto;
height: 320pt;
/*overflow: hidden;*/
}
</style>
</head>
<body>
<header>
{% include 'sampleTemplate_header.html' %}
</header>
<div >
<table class="body_content">
<tbody >
<tr style="padding-top:5px;" >
<td style="width:60%;" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" >
</td>
<td style="width:15%;" >
</td>
</tr>
<tr>
</tr>
<tr >
<td style="width:60%;" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" align="right" >
</td>
<td style="width:15%;" align="center" >
</td>
</tr>
<tr >
<td style="width:60%;" id="testCell" >
</td>
<td style="width:10%;" >
</td>
<td style="width:15%;" >
</td>
<td style="width:15%;" >
</td>
</tr>
</tbody>
</table>
</div>
<footer>
{% include 'sampleTemplate_footer.html' %}
</footer>
</body>
</html>
The CSS portion has a lot of commented code due to my experimenting on the layout, but as much as I change, I cant seem to get the layout I need.
One of my most prevalent issue has been the overlapping text of the body content with the header or the footer. The later even happens, despite a forced page-break-after.
I got it working with the running elements (I was reading about it here: https://www.w3.org/TR/css-gcpm-3/#running-elements).
If you put the footer before the main content, then it will show on every page. Running elements apparently moves an element from the main flow into the margin, so I guess if it isn't in the page yet it can't move it into the margin.
To get the header/footer to stop overlapping with the contents, I had to play around with the margin value for the #page. Since running elements moves the element into the margin, making the margin bigger gives it more space. In the example below, if you decrease the value for top (or bottom) margin, the header (or footer) will overlap.
Sometimes I have to set the header/footer height value to get it to position in the margin properly, but I didn't have to for this example.
<!DOCTYPE html>
<html>
<head>
<style type="text/css" media="all">
#page {
size: A4 portrait; /* can use also 'landscape' for orientation */
margin: 100px 1cm 150px 1cm;
#top-left{
content: element(header);
}
#bottom-left{
content: element(footer);
}
}
header {
position: running(header);
/*height: 100px;*/
}
footer {
position: running(footer);
/*height: 150px;*/
}
</style>
</head>
<body>
<header>
multiline<br>header<br>lots<br>of<br>lines<br>here<br>
</header>
<footer>
multiline<br>footer<br>lots<br>of<br>lines<br>here
</footer>
<div >
stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff<br>stuff
</div>
</body>
</html>

can't get text from SPAN tag

The structure of the website I'm trying to parse looks like this:
<table border="0" cellpadding="3" cellspacing="0" width="100%">
<tr height="25">
<td class="th" style="border:none" width="2%"> </td>
<td class="th">movie</td>
<td class="th"> </td>
<td class="th"> </td>
</tr>
<tr id="place_1">
<td style="color: #555; vertical-align: top; padding: 6px">
<a name="1"></a>1.
</td>
<td style="height: 27px; vertical-align: middle; padding: 6px 30px 6px 0">
<a class="all" href="/326/">MOVIE TITLE IN SPANISH</a>
<br/>
<span class="text-grey">MOVIE TITLE IN ENGLISH</span>
</td>
<td style="width: 85px">
<div style="width: 85px; position: relative">
<a class="continue" href="/326/votes/">
9.191
</a>
<span style="color: #777">
(592 184)
</span>
</div>
</td>
</tr>
...
...
...
The problem is I can't get the text inside span-tag. I've tried .text as for a-tag, also tried .get_text(). But none of these worked. My code on Python:
for row in table.find_all('tr')[1:]:
info = row.find_all('td')
movies.append({
'spn_title' : info[1].a.text,
'eng_title' : info[1].span.text,
})
The errors I get:
AttributeError: 'NoneType' object has no attribute 'get_text'
or
'eng_title' : info[1].span.text AttributeError: 'NoneType' object has
no attribute 'text'
Try the following. Also, check your soup variable because I can run your code without problem. I suspect that somewhere later in the HTML you don't have one of these present in a row.
If the class names are consistent you could filter only qualifying rows having the appropriate type elements with those classes.Using bs4 4.7.1.
for row in table.select('tr :has(span.text-grey):has(a.all)'):
movies.append({
'spn_title' : row.select_one('.all').text,
'eng_title' : row.select_one('.text-grey').text
})
print(movies)
Otherwise, you want a way to handle if not present. For example,
for row in table.find_all('tr')[1:]:
movies.append({
'spn_title' : row.select_one('.all').text if row.select_one('.all') is not None else 'None',
'eng_title' : row.select_one('.text-grey').text if row.select_one('.text-grey') is not None else 'None'
})
print(movies)
I think that you should use innerHTML.
info[1].getElementsByTagName('span')[0].innerHTML
should work.
I have the same issue but I was able to resolve it.
example
<span class="a-offscreen">$10.99</span>
instead of Elem.FindElementByCss("span.a-offscreen").Text
use:
Elem.FindElementByCss("span.a-offscreen").FindElementByXPath("parent::*").Text
The trick is to get the text of the parent.
Btw, I am using VBA so you need to change it to Python Syntax.

pyfpdf write_html In-line CSS style attribute not working in fpdf python

I am trying to create a PDF file by using pyfpdf in python Django. the following code snippet I am trying to generate the pdf of HTML code and I am using the in-line CSS, but it not rendering the css style
from fpdf import FPDF, HTMLMixin
import os
class WriteHtmlPDF(FPDF, HTMLMixin):
pass
pdf = WriteHtmlPDF()
# First page
pdf.add_page()
html = f"""<h3>xxxxx</h3>
<div style="border:1px solid #000">
<table border="1" cellpadding="5" cellspacing="0">
<tr><th width=20 align="left">xxxxxxxx:</th><td width="100">xxxxxxxx</td></tr>
<tr><th width=20 align="left">xxxxxxxx:</th><td width="100">xxxxxxxxx</td></tr>
<tr><th width=20 align="left">xxxxxxxx:</th><td width="100">xxxxxxxxx</td></tr>
<tr><th width=20 align="left">xxxxxxxx:</th><td width="100">xxxxxxxxx</td></tr>
</table>
</div>
<div style="border: 1px solid; padding: 2px; font-size: 12px;">
<table>
<tr>
<td width="20">xxxxxx: 1</td>
<td width="20">xxxxxx: 0</td>
<td width="20">xxxxxx: 1</td>
</tr>
</table>
</div>"""
PDF file get generated but without the CSS styling.
https://pyfpdf.readthedocs.io/en/latest/reference/write_html/index.html#details
inline css is not supported in pyfpdf

Categories

Resources