How to automatically generate a PDF of a website?

How to automatically generate a PDF of a website? - python

I have a website that has some charts and graphs made using JavaScript libraries. What's a good way to, server-side, auto-generate the HTML, CSS, and JS, and then capture the result in a PDF / PNG / JPG? I'd like to auto-generate reports and email them to my users.
Any programming language is fine, but Ruby / Rails would be best.

I've heard of the wkhtmltopdf project. With the help of the webkit rendering enginge it produces PDFs from a webpage. It offers Python bindings. Ruby bindings are also available: PDFKit

wkhtmltopdf is a good tool to use. I've just used it to generate 500+ pdf documents in one day using a rake task. If you're interested with gems that take advantage of wkhtmltopdf, then you can try WickedPDF or PDFKit.

Related

Preview files with Python / Django

Is there a way to preview files (like .doc, .docx, .png, .jpg, .pdf) in the browser without using Google or microsoft URLs?
Searching on google I can not find a solution that fits in this case. I can not use cloud solutions.

You can use jsPDF to view PDFs. The rest, you could try converting to PDF. This is what Document Management System solutions like Nuxeo do.

If you're looking to do it purely via python, you can use libs for each respective format like pdf2html and docx2html etc, but the best way to do this is to use a frontend utility or framework because the python equivalents are relatively buggy and will slow down your app since they often rely on things like libxml.

Approaches to embedded vector images/charts into PDF

How have people from the Linux world embedded vector images into PDF?
I am attempting to create automated reports from data that I currently render as SVG images. Ideally, I would like to use the same XML in PostScript, PDF or DjVu format. To what degree are those formats able to handle SVG natively?
More broadly, what have people's experiences been? Should I
reuse the native SVG XML?
rasterise SVGs that have already been created?
or use another format?
I'm restricted to formats that are accessible from Ubuntu 10.04 & Python. This will probably exclude me from utilising Adobe Illustrator files.

Investigate Apache FOP, its main purpose is to convert XML to PDF.
Upsides (for this project):
full Apache project (=> reliable)
Downsides (for this project):
Will need to learn XSL-FO
Not Python

Batik is a nice Java SVG library. It has a utility library called batik-rasterizer.jar which can convert SVG into a some useful formats: PDF, TIFF, PNG, and GIF.
You could use Jython to link to this library with python.

HTML page to PDF in Python?

Is there a library available to convert a HTML page (text, images, layout elements etc. ) to a PDF file.
I have an HTML page with figures, text and tables with numbers etc. which I want my clients to be able to download as PDF. How do I do this with Python?

Not too familiar with python, and prince is nice if you are willing to shell out the cash. There is this http://github.com/antialize/wkhtmltopdf that uses webkit. It is a simple command line utility that you can call and it will honor html+css. As far as I know, it is the only free tool to do so well. There is a ruby gem for it http://github.com/jdpace/PDFKit, not that it helps you but might give you some ideas.

Well, there are the reportlab and html2pdf modules, but for best results I'd probably try calling Prince externally (http://www.princexml.com/doc/6.0/python/) .

Have you heard of xhtml2pdf/pisa?
It has the ability to work as a python module or as a separate command line utility.
You can use the documentation here to get started:
http://www.xhtml2pdf.com/doc/pisa-en.html

Python HTML to PDF with floating divs

Is there a way to convert XHTML/HTML with CSS to PDF with floating divs?
I have tried pisa/xhtml2pdf in python and dompdf in PHP both are not able to do so.
Is there any way?

See html-tables-to-pdf-in-php-neither-dompdf-nor-html2ps-pdf-are-working.
A possible path is to use some Layout (Rendering) Engine, such as Webkit or Gecko.
The rendered HTML page can then be saved as PDF. An example of a tool that uses this method is the wkhtmltopdf project.
(I know, this is not related to Python or PHP - you can still drive the tool from a script.).

Found a blog post that does this very thing
http://web.archive.org/web/20130525082452/http://notes.alexdong.com/xhtml-to-pdf-using-pyqt4-webkit-and-headless
got it working rather quickly using the more mature pyqt4 module

Pure python solution to convert XHTML to PDF

I am after a pure Python solution (for the GAE) to convert webpages to pdf.
I had a look at reportlab but the documentation focuses on generating pdfs from scratch, rather than converting from HTML.
What do you recommend? - pisa?
Edit:
My use case is I have a HTML report that I want to make available in PDF too. I will make updates to this report structure so I don't want to maintain a separate PDF version, but (hopefully) convert automatically.
Also because I generate the report HTML I can ensure it is well formed XHTML to make the PDF conversion easier.

Pisa claims to support what I want to do:
pisa is a html2pdf converter using the
ReportLab Toolkit, the HTML5lib and
pyPdf. It supports HTML 5 and CSS 2.1
(and some of CSS 3). It is completely
written in pure Python so it is
platform independent. The main benefit
of this tool that a user with Web
skills like HTML and CSS is able to
generate PDF templates very quickly
without learning new technologies.
Easy integration into Python
frameworks like CherryPy, KID
Templating, TurboGears, Django, Zope,
Plone, Google AppEngine (GAE) etc.
So I will investigate it further

Have you considered pyPdf? I doubt it has anywhere like the functional richness you require, but, it IS a start, and is in pure Python. The PdfFileWriter class would be the one to generate PDF output, unfortunately it requires PageObject instances and doesn't provide real ways to put those together, except extracting them from existing PDF documents. Unfortunately all richer pdf page-generation packages I can find do appear to depend on reportlab or other non-pure-Python libraries:-(.

What you're asking for is a pure Python HTML renderer, which is a big task to say the least ('real' renderers like webkit are the product of thousands of hours of work). As far as I'm aware, there aren't any.
Instead of looking for an HTML to PDF converter, what I'd suggest is building your report in a format that's easily converted to both - for example, you could build it as a DOM (a set of linked objects), and write converters for both HTML and PDF output. This is a much more limited problem than converting HTML to PDF, and hence much easier to implement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to automatically generate a PDF of a website? - python

I've heard of the wkhtmltopdf project. With the help of the webkit rendering enginge it produces PDFs from a webpage. It offers Python bindings. Ruby bindings are also available: PDFKit

wkhtmltopdf is a good tool to use. I've just used it to generate 500+ pdf documents in one day using a rake task. If you're interested with gems that take advantage of wkhtmltopdf, then you can try WickedPDF or PDFKit.

Related

Preview files with Python / Django

Approaches to embedded vector images/charts into PDF

HTML page to PDF in Python?

Python HTML to PDF with floating divs

Pure python solution to convert XHTML to PDF

Categories

Resources