pdfkit: Images overlaps - Row structure does not work - python

I have some html which includes four pictures in a 2x2 row/col style: Picture of html output
I want to convert this into a PDF - trying this with the following lines of code:
options = {"enable-local-file-access": None}
config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)
pdfkit.from_file(path_IN,path_OUT, configuration=config,options=options)
produces the following pdf output:
Picture of pdf file output
Does anyone know how I can output a PDF that has the same look as the HTML code has?
Currently, I am generating my 2x2 structure by calling the following HTML code
<div class = "row_container">
<div class = "row">
<p class="caption">Figure 1: Real yield vs. Duration</p>
<img src="IL_fig1.png" width = "{width}" height = "{height}">
</div>
<div class = "row">
<p class="caption">Figure 3: Inflation</p>
<img src="IL_fig3.png" width = "{width}" height = "{height}">
</div>
</div>
<div class = "row_container">
<div class = "row">
<p class="caption">Figure 2: Break-even-inflation vs. Duration</p>
<!--<p>BEI, seasonal adjustment</p>-->
<img src="IL_fig2.png" width = "{width}" height = "{height}">
</div>
<div class = "row">
<p class="caption">Figure 4: Break-even-inflation, 10Y inflation-linked bonds</p>
<img src="IL_fig4.png" width = "{width}" height = "{height}">
</div>
</div>
Together with some CSS styling:
.row_container {
display: flex;
justify-content: space-between;
-webkit-box-pack: center;
overflow: hidden;
padding: 0px 3px 0px 3px;
}
.row {
-webkit-box-flex: 1;
-webkit-flex: 1;
/*flex: 1;*/
}
I have tried to scale the pictures, but this did not help either.:
pdf output with scaled images
I would like to have 2 pictures side by side in 2 rows as in the output from the HTML.
Appreciate the help.

Related

Weasy-print convert to pdf with border image

I’m trying to create pdf file for payment receipt, but I’m not able to figure out how I should set border for it.
As border I want to use this image:
But while converting it to pdf, next page gets like this:
How can I make it constant border for all pages?
Python + Django code:
from weasyprint import HTML
html_string = render_to_string('receipt.html', DATA)
html = HTML(string=html_string)
result = html.write_pdf()
f = open(str(os.path.join(MEDIA_URL + "invoice_receipt/", 'temp.pdf')), 'wb')
f.write(result)
file_obj = File(open(MEDIA_URL + "invoice_receipt/" + "temp.pdf", 'rb'))
transaction.receipt_file = file_obj
transaction.save()
receipt.html template:
<style>
table tbody tr td{
border-top: unset !important;
}
table tbody tr:nth-child(7) td,
table tbody tr:nth-child(8) td,
table tbody tr:nth-child(9) td,
table tbody tr:nth-child(10) td,
table tbody tr:nth-child(11) td,
table tbody tr:nth-child(12) td
{
padding-top: 0;
padding-bottom: 0;
}
.amount-in-words{
border-bottom:3px solid black;
}
.table thead th {
vertical-align: bottom;
border-bottom: 4px solid black;
}
/* .invoice-template{
padding: 20px;
border: 20px solid transparent;
border-image: linear-gradient(to right,#633363 50%,#f3c53d 50%);
border-image-slice: 1;
} */
.logo{
margin-top: 2rem;
}
.logo2{
margin-top: 2rem;
height: 160px;
width:200px;
}
.invoice-template{
padding: 20px;
background-image: url('https://dev-api.test.com/files/files/DumpData/Frame.png');
background-repeat: no-repeat;
background-size: contain;
break-inside: auto;
}
.main-container{
border: 1px solid black;
padding: 20px 10px;
background: white;
}
p {
font-weight: 500;
}
</style>
</head>
<body>
<div class="container invoice-template">
<!-- <div class="main-container"> -->
<div class="row justify-content-center">
<div class="col-md-5 logo"><img src={{ logo }} class="logo2"></div>
<div class="col-md-5 text-right">
<ul style="list-style: none; color: purple; margin-top: 2rem;">
<li>{{ phone }}<span></span></li>
<li><p>{{ email }}<br>{{ website }}</p><span></span></li>
<li>Resource Factory Pvt. Ltd.<br>{{ shop_address|linebreaksbr }}<span></span></li>
</ul>
</div>
</div>
<div class="row text-center">
<div class="col-md-12"><h6>INVOICE</h6></div>
</div>
<div class="row justify-content-center">
<div class="col-md-5">
<p>
To,<br>
{{ user_name }}<br>
{{ user_address|linebreaksbr }}
</p>
<p>Client GST Number.:</p>
</div>
<div class="col-md-5 text-center">
<p>Date: {{ order_date|date:"d-m-Y" }}</p>
<p>Invoice No. {{ invoice }}</p>
</div>
</div>
I’m giving a short version of my html code. If needed full code please mention.
The behavior of box decorations when a box is split (like your main <div> here) is controller by box-decoration-break. Default is slice which breaks the borders after rendering them. clone will compute the borders on each part of the box:
.invoice-template {
box-decoration-break: clone;
}

Selenium Python Element not found

This is a link to HTML I want to scrape
https://pk.khaadi.com/unstitched/r20206-red-r20206-red-pk.html
<div class="swatch-attribute-options clearfix">
<div class="swatch-option color selected" option-type="1" option-
id="61" option-label="RED" option-tooltip-thumb="" option-tooltip-
value="#ee0000" "="" style="background: #ee0000 no-repeat center;
background-size: initial;">
</div>
<div class="swatch-option color selected" option-type="1" option-
id="73" option-label="YELLOW" option-tooltip-thumb="" option-tooltip-
value="#feed00" "="" style="background: #feed00 no-repeat center;
background-size: initial;">
</div>
</div>
Color = S_Driver.find_elements_by_xpath( '//*[#id="product-options-wrapper"]/div/div/div[1]/div' )
The Xpath is of the outer div in which both color div are present
for c in Color:
n_Color.append(c.get_attribute( 'option-label' ))
print( n_Color + '\n' )
This how i tried to extract the color through 'option-label' attribute
Change the xpath with:
//div[#class='swatch-option color']
Created based on the provided screenshot, hope that there are no other matches on page based on this one. If so, change it with:
//div[#class='swatch-option color' and #option-type='1']

Comparing two HTML files using Python difflib package

I am having issues with comparing two HTML files using Pythob difflib. While I was able to generate a comparison file that highlighted any changes, when I opened the comparison file, it displayed the raw HTML and CSS script/tags instead of the plain text.
E.g
<Html><Body><div class="a"> Text Here</div></Body></html>
instead of
Text Here
My Python Script is as follows:
import difflib
file1 = open('file1.html', 'r').readlines()
file2 = open('file2.html', 'r').readlines()
diffHTML = difflib.HtmlDiff()
htmldiffs = diffHTML.make_file(file1,file2)
with open('Comparison.html', 'w') as outFile:
outFile.write(htmldiffs)
My input files looks something like this
<!DOCTYPE html>
<html>
<head>
<title>Text here</title>
<style type="text/css">
#media all {
h1 {
margin: 0px;
color: #222222;
}
#page-title {
color: #222222;
font-size: 1.4em;
font-weight: bold;
}
body {
font: 0.875em/1.231 tahoma, geneva, verdana, sans-serif;
padding: 30px;
min-width: 800px;
}
.divider {
margin: 0.5em 15% 0.5em 10%;
border-bottom: 1px solid #000;
}
}
.section.header {
background-color: #303030;
color: #ffffff;
font-weight: bold;
margin: 0 0 5px 0;
padding: 5px 0 5px 5px;
}
.section.subheader {
background-color: #CFCFCF;
color: #000;
font-weight: bold;
padding: 1px 5px;
margin: 0px auto 5px 0px;
}
.response_rule_prefix {
font-style: italic;
}
.exception-scope
{
color: #666666;
padding-bottom: 5px;
}
.where-clause-header
{
color:#AAAA99;
}
.section {
padding: 0em 0em 1.2em 0em;
}
#generated-Time {
padding-top:5px;
float:right;
}
#page-title, #generated-Time {
display: inline-block;
}
</style></head>
<body>
<div id="title-section" class="section ">
<div id="page-branding">
<h1>Title</h1>
</div>
<div id="page-title">
Sub title
</div>
<div id="generated-Time">
Date & Time : Jul 2, 2020 2:42:48 PM
</div>
</div>
<div class="section header">General</div>
<div id="general-section" class="section">
<div class="general-detail-label-container">
<label id="policy-name-label">Text here</label>
</div>
<div class="general-detail-content-container">
<span id="policy-name-content" >Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-description-label">Description A :</label>
</div>
<div class="general-detail-content-container">
<span id="policy-description-content""></span>Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-label-label" class="general-detail-label">Description b:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-label-content" class="wrapping-text"></span>
</div>
<div class="general-detail-label-container">
<label id="policy-group-label" class="general-detail-label">Group:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-group-content" class="wrapping-text">Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-status-label" class="general-detail-label">Status:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-status-content">
<label id="policy-status-message">Active</label>
</span>
</div>
<div class="general-detail-label-container">
<label id="policy-version-label" class="general-detail-label">Version:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-version-content" class="wrapping-text">7</span>
</div>
<div class="general-detail-label-container">
<label id="policy-last-modified-label" class="general-detail-label">Last Modified:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-last-modified-content" class="wrapping-text">Jun 15, 2020 2:41:48 PM</span>
</div>
</div>
</body>
</html>
Assuming that you are only looking for the text changes and not changes to the HTML, you could strip the output of HTML after the comparison. There are a number of ways to achieve this. The two that I first thought of was:
RegEx, because this is native in Python, and
BeautifulSoup, because it was created to read webpages
Create a function, using either of above methods, to strip the output of HTML
e.g. using BeautifulSoup
UPDATE
Reading through the documentation again, it seems to me that the comparison will yield the actual HTML too, however, you could create an additional output that only shows the text changes.
Also to avoid showing the whole document, I've set the context parameter to True
Using BeautifulSoup
import re
import difflib
from bs4 import BeautifulSoup
def remove_html_bs(raw_html):
data = BeautifulSoup(raw_html, 'html.parser')
data = data.findAll(text=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('<!--.*-->', str(element.encode('utf-8'))):
return False
return True
result = list(filter(visible, data))
return result
def compare_files(file1, file2):
"Creates two comparison files"
file1 = file1.readlines()
file2 = file2.readlines()
# Difference line by line - HTML
difference_html = difflib.HtmlDiff(tabsize=8).make_file(file1, file2, context=True, numlines=5)
# Difference line by line - raw
difference_file = set(file1).difference(file2)
# List of differences by line index
difference_index = []
for dt in difference_file:
for index, t in enumerate(file1):
if dt in t:
difference_index.append(f'{index}, {remove_html_bs(dt)[0]}, {remove_html_bs(file2[index])[0]}')
# Write entire line with changes
with open('comparison.html', 'w') as outFile:
outFile.write(difference_html)
# Write only text changes, by index
with open('comparison.txt', 'w') as outFile:
outFile.write('LineNo, File1, File2\n')
outFile.write('\n'.join(difference_index))
return difference_html, difference_file
file1 = open('file1.html', 'r')
file2 = open('file2.html', 'r')
difference_html, difference_file = compare_files(file1, file2)

Vertical Scroll bar django-tables2

Can anyone tell me how to add a vertical scroll bar to django-tables2 instead of having
Page 1 of 2 Next 25 of 49 vehicles
at the bottom of the table.
tables.py
'''
Created on 28 Oct 2016
#author: JXA8341
'''
import django_tables2 as tables
from .models import Vehicle
class CheckBoxColumnWithName(tables.CheckBoxColumn):
#property
def header(self):
return self.verbose_name
class VehicleTable(tables.Table):
update = tables.CheckBoxColumn(accessor="pk",
attrs = { "th__input":{"onclick": "toggle(this)"}},
orderable=False)
class Meta:
model = Vehicle
fields = ('update', 'vehid')
# Add class="paleblue" to <table> tag
attrs = {'class':'paleblue'}
screen.css
table.paleblue + ul.pagination {
font: normal 11px/14px 'Lucida Grande', Verdana, Helvetica, Arial, sans- serif;
overflow: scroll;
margin: 0;
padding: 10px;
border: 1px solid #DDD;
list-style: none;
}
div.table-container {
display: inline-block;
position:relative;
overflow:auto;
}
table.html
<div class='vehlist'>
<script language="JavaScript">
function toggle(source) {
checkboxes = document.getElementsByName('update');
for(var i in checkboxes)
checkboxes[i].checked = source.checked;
}
</script>
<form action="/loadlocndb/" method="POST" enctype="multipart/form-data">
{% csrf_token %}
{% render_table veh_list %}
<h4> Location database .csv file</h4>
{{ form.locndb }}
<input type="submit" value="Submit" />
</form>
</div>
I've looked all over but I can't seem to get a straight answer, or is there a better table module I can use to display the array and checkboxes?
For anyone in the same boat I figured it out.
I turned disabled pagination
RequestConfig(request, pagination=False).configure(veh_list)
then I wrapped the table in a <div> in the html template
<div style="width: 125px; height: 500px; overflow-y: scroll;">
{% render_table veh_list %}
</div>
The <div> then adds a scrollbar to the whole table interface, I personally would have liked to keep the header constantly at the top but this is the best solution I could come up with.

Python html div class

I'm trying to write a simple program which saves the values of a table in a matrix (later I want to send the matrix to a database).
Here is my code:
pfad = "https://business.facebook.com/ads/manager/account/ads/?act=516059741896803&pid=p2&report_spec=6056690557117&business_id=401807279988717"
html = urlopen(pfad)
r=requests.get(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll("div", { "class" : "ellipsis_1ha3" })
# no output:
for div in mydivs:
if (div["class"]=="ellipsis_1ha3"):
print div
# output: []
print(mydivs)
I want the values inside of the divs with class ellipsis _1ha3, but I don't know why it doesn't work. Can anyone help me?
Here is an example html which is like the original
<!DOCTYPE html>
<html>
<head>
<style>
.ellipsis_1ha3
{
width: 100px;
border: 1px solid black;
}
.a
{
width: 100px;
border: 1px solid black;
}
</style>
</head>
<body>
<div>
<div style="display: inline-flex;">
<div class="a">Purchase</div>
<div class="a">Clicks</div>
</div>
</br>
<div style="display: inline-flex;">
<div class="ellipsis_1ha3">20</div>
<div class="ellipsis_1ha3">30</div>
</div>
</br>
<div style="display: inline-flex;">
<div class="ellipsis_1ha3">10</div>
<div class="ellipsis_1ha3">50</div>
</div>
</div>
</body>
</html>
SECOND EXAMPLE
pfad = "http://www.bundesliga.de/de/liga/tabelle/"
html = urlopen(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll('div', { 'class' : 'wwe-cursor-pointer' })
for div in mydivs:
if ("wwe-cursor-pointer" in div["class"]):
print div
Try using lxml and xpath expressions to pull out the relevant information. Beautifulsoup is built on lxml, I believe. Assuming you loaded the document into a string called html_string.
from lxml import html
h = html.fromstring(html_string)
h.xpath('//div[#class="ellipsis_1ha3"]/node()')
#output:
['20', '30', '10', '50']

Categories

Resources