Approaches to embedded vector images/charts into PDF - python

How have people from the Linux world embedded vector images into PDF?
I am attempting to create automated reports from data that I currently render as SVG images. Ideally, I would like to use the same XML in PostScript, PDF or DjVu format. To what degree are those formats able to handle SVG natively?
More broadly, what have people's experiences been? Should I
reuse the native SVG XML?
rasterise SVGs that have already been created?
or use another format?
I'm restricted to formats that are accessible from Ubuntu 10.04 & Python. This will probably exclude me from utilising Adobe Illustrator files.

Investigate Apache FOP, its main purpose is to convert XML to PDF.
Upsides (for this project):
full Apache project (=> reliable)
Downsides (for this project):
Will need to learn XSL-FO
Not Python

Batik is a nice Java SVG library. It has a utility library called batik-rasterizer.jar which can convert SVG into a some useful formats: PDF, TIFF, PNG, and GIF.
You could use Jython to link to this library with python.

Related

How to open .fif file format?

I want to open a .fif file of size around 800MB. I googled and found that these kind of files can be opened with photoshop. Is there a way to extract the images and store in some other standard format using python or c++.
This is probably an EEG or MEG data file. The full specification is here, and it can be read in with the MNE package in Python.
import mne
raw = mne.io.read_raw_fif('filename.fif')
FIF stands for Fractal Image Format and seems to be output of the Genuine Fractals Plugin for Adobe's Photoshop. Unfortunately, there is no format specification available and the plugin claims to use patented algorithms so you won't be able to read these files from within your own software.
There however are other tools which can do fractal compression. Here's some information about one example. While this won't allow you to open FIF files from the Genuine Fractals Plugin, it would allow you to compress the original file, if still available.
XnView seems to handle FIF files, but it's windows-only. There is a MP or Multiplatform version, but it seems less complete and didn't work when I tried to view a FIF file.
Update: XnView MP, which does work on Linux and OSX claims to support FIF, but I couldn't get it to work.
Update2: There's also an open source project:Fiasco that can work with fractal images, but not sure it's compatible with the proprietary FIF format.

Is there a Python library to create thumbnails for various document file formats?

I'd like to generate thumbnails from various "document" file formats such as odt, doc(x) and ppt(x) but also mp4, psd, tiff (and possibly others) from a Python application. As far as I know for each of these formats there is at least one open source application which can generate preview images/thumbnails (e.g. LibreOffice, ffmpeg) or at least extract embedded thumbnails (e.g. imagemagick).
My main problem is that each of these applications/libraries use different command line options so I'm looking for a Python library (or a unified CLI tool) which provides a high-level API to generate a thumbnail with specified dimensions, quality level given a filename and calls the appropriate external tool (ideally including catching exceptions, segfaults and timeouts). Bonus points if it can generate multiple thumbnails if requested (e.g. one per page, page X-Y, every Z seconds but at most N images).
Does anyone know such a library/utility? (Boundary condition: The files may contain sensitive material or might be quite big so this must work without any network communication, using an external web service is not possible.)
If there is no such thing in Python, a locally installable web service would be fine as well.
I ended up writing my own library (named anythumbnailer, MIT license) which worked well enough for my immediate needs. The library is not what I envisioned (only basic thumbnailing, no support for dimensions, …) but it can generate thumbnails for doc(x), xls(x), ppt(x), videos and pdf on Linux with the help of ffmpeg, LibreOffice and ffmpeg.
you can look at Preview generator. preview-generator is a library for generating preview - thumbnails, pdf, text and json overview for all your file-based content. This module gives you access to jpeg, pdf, text, htlm and json preview of virtually any kind of file. It also includes a cache mechanism so you do not have to care about preview storage.

How to automatically generate a PDF of a website?

I have a website that has some charts and graphs made using JavaScript libraries. What's a good way to, server-side, auto-generate the HTML, CSS, and JS, and then capture the result in a PDF / PNG / JPG? I'd like to auto-generate reports and email them to my users.
Any programming language is fine, but Ruby / Rails would be best.
I've heard of the wkhtmltopdf project. With the help of the webkit rendering enginge it produces PDFs from a webpage. It offers Python bindings. Ruby bindings are also available: PDFKit
wkhtmltopdf is a good tool to use. I've just used it to generate 500+ pdf documents in one day using a rake task. If you're interested with gems that take advantage of wkhtmltopdf, then you can try WickedPDF or PDFKit.

Compress PDFs using Python

So I have a gazillion pdfs in a folder, I want to recursively (using os.path.walk) shrink them. I see that adobe pro has a save as reduced size. Would I be able to use this / how do you suggest I do it otherwise.
Note: Yes, I would like them to stay as pdfs because I find that to be the most commonly used and installed fileviewer.
From the project's GitHub page for pdfsizeopt, which is written in Python:
pdfsizeopt is a program for converting large PDF files to small ones. More specifically, pdfsizeopt is a free, cross-platform command-line application (for Linux, Mac OS X, Windows and Unix) and a collection of best practices to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is written in Python..."
You can probably easily adapt this to your specific needs.
Realize this is an old question. Thought I would suggest an alternative to pdfsizeopt, as I have experienced quality loss using it for PDFs of maps. PDFTron offers a comprehensive set of functionality. Here is a snippet modified from their web-page (see "example 1"):
import site
site.addsitedir(r"...pathToPDFTron\PDFNetWrappersWin32\PDFNetC\Lib")
from PDFNetPython import PDFDoc, Optimizer, SDFDoc
doc = PDFDoc(inPDF_Path)
doc.InitSecurityHandler()
Optimizer.Optimize(doc)
doc.Save(outPDF_Path, SDFDoc.e_linearized)
doc.Close()

Pure python solution to convert XHTML to PDF

I am after a pure Python solution (for the GAE) to convert webpages to pdf.
I had a look at reportlab but the documentation focuses on generating pdfs from scratch, rather than converting from HTML.
What do you recommend? - pisa?
Edit:
My use case is I have a HTML report that I want to make available in PDF too. I will make updates to this report structure so I don't want to maintain a separate PDF version, but (hopefully) convert automatically.
Also because I generate the report HTML I can ensure it is well formed XHTML to make the PDF conversion easier.
Pisa claims to support what I want to do:
pisa is a html2pdf converter using the
ReportLab Toolkit, the HTML5lib and
pyPdf. It supports HTML 5 and CSS 2.1
(and some of CSS 3). It is completely
written in pure Python so it is
platform independent. The main benefit
of this tool that a user with Web
skills like HTML and CSS is able to
generate PDF templates very quickly
without learning new technologies.
Easy integration into Python
frameworks like CherryPy, KID
Templating, TurboGears, Django, Zope,
Plone, Google AppEngine (GAE) etc.
So I will investigate it further
Have you considered pyPdf? I doubt it has anywhere like the functional richness you require, but, it IS a start, and is in pure Python. The PdfFileWriter class would be the one to generate PDF output, unfortunately it requires PageObject instances and doesn't provide real ways to put those together, except extracting them from existing PDF documents. Unfortunately all richer pdf page-generation packages I can find do appear to depend on reportlab or other non-pure-Python libraries:-(.
What you're asking for is a pure Python HTML renderer, which is a big task to say the least ('real' renderers like webkit are the product of thousands of hours of work). As far as I'm aware, there aren't any.
Instead of looking for an HTML to PDF converter, what I'd suggest is building your report in a format that's easily converted to both - for example, you could build it as a DOM (a set of linked objects), and write converters for both HTML and PDF output. This is a much more limited problem than converting HTML to PDF, and hence much easier to implement.

Categories

Resources