wkhtmltopdf adds extra spacing when multiple width, height : 100% elements - python

I'm using wkhtmltopdf via subprocess to generate PDFs in a python flask web app. It works great for most html input I give, but applies some very weird formatting in certain cases. For some cases, I need to append the html block together 5 times. In that case, the PDFs generated contain multiple blank pages in between content (which I would rather not have).
The format of the HTML I provide is the following:
<body>
<center>
<table height="100%" width="100%" id="backgroundTable" style="
height: 100%;
width: 100%;">
<!-- More content -->
</table>
</center>
</body>
I have found that the culprits are the height="100%" width="100%" and style="height: 100%; width: 100%;" calls for the <table> element.
Does anyone know why that would cause the issue?

Related

Is any posibility to extract certain html tags from a string using python

For example, I have a big chunk of text that has HTML tags in there and I want to have a function that is removing HTML tags from my code.
But I want to delete just the tags not text.
The problem is more complicated than it seems because if you have some ol or ul tags and I want to delete ol first I don't want the text to be delete and
the li tag to be deleted but just for ol tag, not for ul.
I have tried to use BeautifulSoup and some NLP tehnics but with no success
from bs4 import BeautifulSoup
import nltk
from nltk.tokenize import word_tokenize
html_know='''<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGTVf63Vm3XgOncMVSOy0-jSxdMT8KVJIc8WiWaevuWiPGe0Pm" class="image_master" alt="" style="width: 248px; height: 164px; vertical-align: middle;">
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGTVf63Vm3XgOncMVSOy0-jSxdMT8KVJIc8WiWaevuWiPGe0Pm" style="width: 250px; height: 166px;">
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGTVf63Vm3XgOncMVSOy0-jSxdMT8KVJIc8WiWaevuWiPGe0Pm" style="width: 249px; height: 165px;">
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGTVf63Vm3XgOncMVSOy0-jSxdMT8KVJIc8WiWaevuWiPGe0Pm" style="width: 249px; height: 165px;">
<p></p>
<p><strong><span style="font-family: Impact, Charcoal, sans-serif; font-size: 36px;">HTML</span></strong></p> <span style="background-color: rgb(255, 255, 0);"><p></p>HTML stands for Hyper Text Markup Language, which is the most widely used language on Web to develop web pages. HTML was created by Berners-Lee in late 1991 but "HTML 2.0" was the first standard HTML specification which was published in 1995. HTML 4.01 was a major version of HTML and it was published in late 1999.</span>Though
HTML 4.01 version is widely used but currently we are having HTML-5 version which is an extension to HTML 4.01, and this version was published in 2012. Audience This tutorial is designed for the aspiring Web Designers and Developers with a need to understand
the HTML in enough detail along with its simple overview, and practical examples. This tutorial will give you enough ingredients to start with HTML from where you can take yourself at higher level of expertise.
<p></p>
<p>
</p>
<p></p>HTML stands for Hypertext Markup Language, and it is the most widely used language to write Web Pages. Hypertext refers to the way in which Web pages (HTML documents) are linked together. Thus, the link available on a webpage is called Hypertext.
As its name suggests, HTML is a Markup Language which means you use HTML to simply "mark-up" a text document with tags that tell a Web browser how to structure it to display. Originally, HTML was developed with the intent of defining the structure of
documents like headings, paragraphs, lists, and so forth to facilitate the sharing of scientific information between researchers. Now, HTML is being widely used to format web pages with the help of different tags available in HTML language. Basic HTML
Document In its simplest form, following is an example of an HTML document
<p></p>
<p><img src="" style="width: 119px; height: 119px;"></p>
<table style="width:100%">
<tbody>
<tr>
<th style="border-color: rgb(0, 0, 0);">Firstname</th>
<th style="border-color: rgb(0, 0, 0);">Lastname</th>
<th style="border-color: rgb(0, 0, 0);">Age</th>
</tr>
<tr>
<td style="border-color: rgb(0, 0, 0);">Jill</td>
<td style="border-color: rgb(0, 0, 0);">Smith</td>
<td style="border-color: rgb(0, 0, 0);">50</td>
</tr>
<tr>
<td style="border-color: rgb(0, 0, 0);">Eve</td>
<td style="border-color: rgb(0, 0, 0);">Jackson</td>
<td style="border-color: rgb(0, 0, 0);">94</td>
</tr>
</tbody>
</table>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>HTML Tags As told earlier, HTML is a markup language and makes use of various tags to format the content. These tags are enclosed within angle braces Except few tags, most of the tags have their corresponding closing tags. For example, has its closing
tag and tag has its closing tag tag etc. Above example of HTML document uses the following tags ? Sr.No Tag & Description 1 This tag defines the document type and HTML version. 2 This tag encloses the complete HTML document and mainly comprises of
document header which is represented by ... and document body which is represented by ... tags. 3 This tag represents the document's header which can keep other HTML tags like html,head,body,title,...etc
<ol>
<li>2</li>
<li>2</li>
<li>3</li>
</ol>
<ul>
<li>sdfsdf</li>
<li>s</li>
<li>dfsd</li>
<li>f</li>
<li>sd</li>
<li>f</li>
<li>sd</li>
</ul>
<p></p>
<p><iframe width="1019px" height="311px" src="//www.youtube.com/embed/uCg2BoKiuOM" frameborder="0" allowfullscreen=""></iframe></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>'''
soup=BeautifulSoup(html_know, 'html.parser')
tags=soup.find_all('table')
print(tags[0].text)
print(html_know[3])
The idea behind this is that sometimes I want to delete some tags and other times to delete other tags.
PLease if you can give me some idea to this without to hard code everything

Scrapy Text after <div>

I want to crawl the following HTML code using Scrapy:
<tbody id="pageData11">
<tr>
<td>
<div style="border-left:3px solid #1A8CFF !important; float: left; padding-right: 5px;"> </div>
2018-May-29 Tuesday
</td>
Strictly speaking, the answer to your question is response.xpath('/html/body/tbody/tr/td/div/following::text()').extract_first().strip(), but, in this case, it's also the text in the td. Thus you can also do something like "".join(i.strip() for i in response.css('td::text').extract()).
Just considering your given example in question.
response.css('td::text').extract())

Locating an element using Selenium with Python

Edited based on the answers:
I am using Selenium with Python and trying to locate a button on an web application on Chrome. The block of code has an iframe as mentioned in the answer.
<iframe data-bind="attr: { src: src, foo: $root.registerTargetDisplayFrame($data, $element) }, event: {load: function() {loaded(true);}, focus: $root.blurredNavigationPane}" src="https://products.com/InfoShareAuthor/home">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html>
<head>code here
<frameset id="IshTop" class="infoshareauthor" framespacing="0" border="0" bordercolor="#FFFFFF" frameborder="0" rows="31,25,*,0">
<frame id="MenuBar" scrolling="no" name="MenuBar" src="./MainMenuBar.asp">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<body>
<div id="Top-Menu-Container">
<div id="top-menu-wrapper">
<div id="top-menu">
<form name="MainBar">
<script type="text/javascript" language="javascript">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<div style="POSITION: relative;">
<div height="30" style="POSITION: absolute; z-index:0; top: 4px; margin-left: -5px">
<a href="javascript:TabSelect(1);">
<img border="0" src="./UIFramework/tab_active.png">
</a>
</div>
<div onclick="javascript:TabSelect(1);" style="POSITION: absolute; z-index:2; top: -8px">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td id="MenuButton1" class="tab_active" width="95" valign="bottom" height="30" align="center" style="cursor:pointer;padding-bottom:2px;" name="Repository">Repository</td>
</tr>
</tbody>
</table>
</div>
</div>
</td>
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<td width="95" valign="bottom">
<td width="95" valign="bottom">
</tr>
</tbody>
</table>
</form>
</div>
<div id="top-help">
<div id="top-nav-links">
</div>
</div>
</body>
</html>
</frame>
<frame id="BreadCrumbs" frameborder="0" border="0" scrolling="no" name="BreadCrumbs" src="./BreadCrumbs.asp">
<frameset id="Application" bordercolor="#0099CC" frameborder="0" rows="0,*,0,0,0,0">
<frameset id="HiddenFrameSet" bordercolor="#0099CC" frameborder="0" rows="0,0,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1">
<noframes> It looks like your browser doesn't support frames. This page requires frames in order to function. <br><br>For more information, please <a href='http://www.trisoftcms.com/en/contact-us.html' target=_blank style='white-space:nowrap'>contact us</a>. </noframes>
</frameset>
</html>
</iframe>
I switched frames using this:
iframe = browser.find_element_by_xpath("//iframe[#src='https://products.com/InfoShareAuthor/home']")
browser.switch_to.frame(iframe)
The code that I wrote:
browser.find_element_by_xpath("//td[#id='MenuButton1'][#name='Repository'][contains(text(),'Repository')]")
I could find the element using this xpath when I did a Firebug search
I also tried:
browser.find_element_by_id("MenuButton1")
and
browser.find_element_by_name("Repository")
Note: When I click the button, the URL does not change. Just a list of items in the application expands. Also, IDs and the Names are unique for the seven five menu buttons. None of the menu buttons work.
Does any one have any idea about what might be wrong? I am very new to Python and Selenium.
This doesn't exactly answer your question, but it does address what you're trying to do: it is likely you can accomplish the same task (and many others) with SDL's API client ISHRemote.
https://github.com/sdl/ISHRemote
For example, if you're looking for all the directories under '\General':
Import-Module ISHRemote
# first authenticate
$session = New-IshSession -IshPassword $password -IshUserName $username -WsBaseUrl 'https://ccms.example.com/InfoShareWS/'
# get a list of all the child folders under General
Get-IshFolder -IshSession $session -FolderPath '\General' -Recurse -Depth 2
Or if you're trying to get a list of files in a particular directory:
Import-Module ISHRemote
# first authenticate
$session = New-IshSession -IshPassword $password -IshUserName $username -WsBaseUrl 'https://ccms.example.com/InfoShareWS/'
# get all content in this folder
Get-IshFolderContent -IshSession $session -FolderPath 'General\path\to\topics'
With ISHRemote, you can also find and update publications, move content, modify metadata, etc.
Hope that helps.
you can try load iframe url. It avoids issues with selenium waiting from the iframe to load

Printing dynamic django view template

I'm working on a django app. I have a page that displays a log of items, and each item has a "Print label" link. At the moment, clicking the link displays the label for that particular item in a popup screen, but does not send the label to a printer. The view function behind the "Print label" link is shown below:
#login_required
def print_label(request, id):
s = Item.objects.get(pk = id)
return render_to_response('templates/label.html', {'s': s}, context_instance=RequestContext(request))
The HTML for the label is shown below:
{% load humanize %}
<head>
<style type="text/css">
div{
min-width: 350px;
max-width: 350px;
text-align: center;
}
body{
font-family: Arial;
width: 370px;
height: 560px;
text-align: center;
}
</style>
</head>
<body>
<div id="labelHeader">
<img src="{{ STATIC_URL }}img/label-header.png" width="350px">
</div>
<hr/>
<p></p>
<div id="destinationAddress">
<span style="font-size: xx-large; font-weight: bold;">{{ s.item_number }}</span>
</p>
DESTINATION:
<br/>
<strong>{{s.full_name}}</strong><br/>
<strong>{{ s.address }}</strong><br/>
<strong>{{s.city}}, {{s.state}}</strong><br/>
<strong>Tel: {{s.telephone}}</strong>
</div>
<p></p>
<hr/>
<div id="labelfooter">
<img src="{{ STATIC_URL }}img/label-footer.png" width="350px">
</div>
</body>
My question is, how can I also send the label displayed to a printer in the same function? I researched and found some libraries (like xhtml2pdf, webkit2png, pdfcrowd, etc), but they'll create a pdf or image file of the label and I'll have to send it to a printer. Is it possible to send straight to a printer without creating a pdf copy of the label? If so, please show me how to achieve this.
Your answers and suggestions are highly welcome. Thank you.
Presumably, as this is a Django app, it's the client's printer that you need to use. The only way to do this is to tell the user's browser to print. You will need to use Javascript for this: window.print().

Bottle Static files

I have tried reading the docs for Bottle, however, I am still unsure about how static file serving works. I have an index.tpl file, and within it it has a css file attached to it, and it works. However, I was reading that Bottle does not automatically serve css files, which can't be true if the page loads correctly.
I have, however, run into speed issues when requesting the page. Is that because I didn't use the return static_file(params go here)? If someone could clear up how they work, and how they are used when loading the page, it would be great.
Server code:
from Bottle import route,run,template,request,static_file
#route('/')
def home():
return template('Templates/index',name=request.environ.get('REMOTE_ADDR'))
run(host='Work-PC',port=9999,debug=True)
Index:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>index</title>
<link type="text/css"
href="cssfiles/mainpagecss.css"
rel="stylesheet">
</head>
<body>
<table
style="width: 100%; text-align: left; margin-left: auto; margin-right: auto;"
border="0" cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td>
<h1><span class="headertext">
<center>Network
Website</center>
</span></h1>
</td>
</tr>
</tbody>
</table>
%if name!='none':
<p align="right">signed in as: {{name}}</p>
%else:
pass
%end
<br>
<table style="text-align: left; width: 100%;" border="0" cellpadding="2"
cellspacing="2">
<tbody>
<tr>
<td>
<table style="text-align: left; width: 100%;" border="0"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td style="width: 15%; vertical-align: top;">
<table style="text-align: left; width: 100%;" border="1"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td>Home<br>
<span class="important">Teamspeak Download</span><br>
<span class="important">Teamspeak Information</span></td>
</tr>
</tbody>
</table>
</td>
<td style="vertical-align: top;">
<table style="text-align: left; width: 100%;" border="1"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td>
<h1><span style="font-weight: bold;">Network Website</span></h1>
To find all of the needed information relating to the network's social
capabilities, please refer to the links in the side bar.</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
To serve static files using bottle you'll need to use the provided static_file function and add a few additional routes. The following routes direct the static file requests and ensure that only files with the correct file extension are accessed.
from bottle import get, static_file
# Static Routes
#get("/static/css/<filepath:re:.*\.css>")
def css(filepath):
return static_file(filepath, root="static/css")
#get("/static/font/<filepath:re:.*\.(eot|otf|svg|ttf|woff|woff2?)>")
def font(filepath):
return static_file(filepath, root="static/font")
#get("/static/img/<filepath:re:.*\.(jpg|png|gif|ico|svg)>")
def img(filepath):
return static_file(filepath, root="static/img")
#get("/static/js/<filepath:re:.*\.js>")
def js(filepath):
return static_file(filepath, root="static/js")
Now in your html, you can reference a file like so:
<link type="text/css" href="/static/css/main.css" rel="stylesheet">
Directory layout:
`--static
| `--css
| `--fonts
| `--img
| `--js
Just providing an answer here because a number of my students have used this code in an assignment and I have a bit of a concern about the solution.
The standard way to serve static files in Bottle is in the documentation:
from bottle import static_file
#route('/static/<filepath:path>')
def server_static(filepath):
return static_file(filepath, root='/path/to/your/static/files')
in this way, all of the files below your static folder are served from a URL starting with /static. In your HTML, you need to reference the full URL path for the resource, eg:
<link rel='stylesheet' type='text/css' href='/static/css/style.css'>
The answer from Sanketh makes it so that any reference to an image, css file etc anywhere in the URL space is served from a given folder inside the static folder. So /foo/bar/baz/picture.jpg and /picture.jpg would both be served from static/images/picture.jpg. This means that you don't need to worry about getting the path right in your HTML code and you can always use relative filenames (ie. just src="picture.jpg").
The problem with this approach comes when you try to deploy your application. In a production environment you want the static resources to be served by a web server like nginx, not by your Bottle application. To enable this, they should all be served from a single part of the URL space, eg. /static. If your code is littered with relative filenames, it won't translate easily to this model.
So, I'd advise using the three line solution from the Bottle tutorial rather than the more complex solution listed on this page. It's simpler code (so less likely to be buggy) and it allows you to seamlessly move to a production environment without code changes.
As indicated in the documentation, you should serve static files using the static function and css is a static file. The static function handles security and some other function which you can find out from the source. The path argument to the static function should point to the directory wherever you store the css files
Rather than use regular expression matching for serving files as in Sanketh's answer, I'd prefer not to modify my templates and provide a path to the static files explicitly, as in:
<script src="{{ get_url('static', filename='js/bootstrap.min.js') }}"></script>
You can do this simply by replacing the <filename> in the static route decorator with one of type :path - like so:
#app.route('/static/<filename:path>', name='static')
def serve_static(filename):
return static_file(filename, root=config.STATIC_PATH)
The :path matches an entire file path in a non-greedy way so you don't have to worry about changing templates when switching to production - just keep everything in the same relative folder structure.
I've used Sanketh's template in the past but over time condensed it to an extension agnostic function. You just have to add extension-folder mappings in the ext_map dictionary. It defaults to static/ folder if an extension is not mapped explicitly.
import os.path
# Static Routes
#get('/<filename>')
def serve_static_file(filename):
ext = os.path.splitext(filename)[1][1:]
ext_map = {'image':['png','gif','jpg','ico'],'js':['js']}
sub_folder = next((k for k, v in ext_map.items() if ext in v),'')
return static_file(filename, root='static/'+sub_folder)

Categories

Resources