Python library for dynamic documents

Python library for dynamic documents - python

I want to write a script that generates reports for each team in my unit where each report uses the same template, but where the numbers specific to each team is used for each report. The report should be in a format like .pdf that non-programmers know how to open and read. This is in many ways similar to rmarkdown for R, but the reports I want to generate are based on data from code already written in python.
The solution I am looking for does not need to export directly to pdf. It can export to markdown and then I know how to convert. I do not need any fancier formatting than what markdown provides. It does not need to be markdown, but I know how to do everything else in markdown, if I only find a way to dynamically populate numbers and text in a markdown template from python code.
What I need is something that is similar to the code block below, but on a bigger scale and instead of printing output on screen this would saved to a file (.md or .pdf) that can then be shared with each team.
user = {'name':'John Doe', 'email':'jd#example.com'}
print('Name is {}, and email is {}'.format(user["name"], user["email"]))
So the desired functionality heavily influenced by my previous experience using rmarkdown would look something like the code block below, where the the template is a string or a file read as a string, with placeholders that will be populated from variables (or Dicts or objects) from the python code. Then the output can be saved and shared with the teams.
user = {'name':'John Doe', 'email':'jd#example.com'}
template = 'Name is `user["name"]`, and email is `user["email"]`'
output = render(template, user)
When trying to find a rmarkdown equivalent in python, I have found a lot of pointers to Jupyter Notebook which I am familiar with, and very much like, but it is not what I am looking for, as the point is not to share the code, only a rendered output.

Since this question was up-voted I want to answer my own question, as I found a solution that was perfect for me. In the end I shared these reports in a repo, so I write the reports in markdown and do not convert them to PDF. The reason I still think this is an answer to my original quesiton is that this works similar to creating markdown in Rmarkdown which was the core of my question, and markdown can easily be converted to PDF.
I solved this by using a library for backend generated HTML pages. I happened to use jinja2 but there are many other options.
First you need a template file in markdown. Let say this is template.md:
## Overview
**Name:** {{repo.name}}<br>
**URL:** {{repo.url}}
| Branch name | Days since last edit |
|---|---|
{% for branch in repo.branches %}
|{{branch[0]]}}|{{branch[1]}}|
{% endfor %}
And then you have use this in your python script:
from jinja2 import Template
import codecs
#create an dict will all data that will be populate the template
repo = {}
repo.name = 'training-kit'
repo.url = 'https://github.com/github/training-kit'
repo.branches = [
['master',15],
['dev',2]
]
#render the template
with open('template.md', 'r') as file:
template = Template(file.read(),trim_blocks=True)
rendered_file = template.render(repo=repo)
#output the file
output_file = codecs.open("report.md", "w", "utf-8")
output_file.write(rendered_file)
output_file.close()
If you are OK with your dynamic doc being in markdown you are done and the report is written to report.py. If you want PDF you can use pandoc to convert.

I would strongly recommend to install and use the pyFPDF Library, that enables you to write and export PDF files directly from python. The Library was ported from php and offers the same functionality as it's php-variant.
1.) Clone and install pyFPDF
Git-Bash:
git clone https://github.com/reingart/pyfpdf.git
cd pyfpdf
python setup.py install
2.) After successfull installation, you can use python code similar as if you'd work with fpdf in php like:
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 13.0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt="Hello", border=0)
pdf.output('myTest.pdf', 'F')
For more Information, take a look at:
https://pypi.org/project/fpdf/
To work with pyFPDF clone repo from: https://github.com/reingart/pyfpdf
pyFPDF Documentation:
https://pyfpdf.readthedocs.io/en/latest/Tutorial/index.html

Related

Jupyter nbconvert LaTex Export Theme

I am using Jupyter Notebook nbconvert (Save as menu) to export as pdf via Latex. However, the pdf file is not in a good shape. For example, some wide tables are shown well. I would prefer to have a box for tables to be resized to the width of the page. Is there any style, template that I can use to have nice reports and how I may ask nbconverter to use that style?
Here is the Latex output:
I would like something like this:

Looks like Pandas gained a ._repr_latex_() method in version 0.23. You'll need to set pd.options.display.latex.repr=True to activate it.
Without latex repr:
With latex repr:
Check out the options to get the formatting close to what you want. In order to match your desired output exactly, you'll need to use a custom latex template.
Edited to provide more information on templates:
Start here for general information about templates. You can create a .tplx file in the same path as your notebook and specify it as the template when running nbconvert from the command line: !jupyter nbconvert --to python 'example.ipynb' --stdout --template=my_custom_template.tplx. Alternatively, you can specify a default template to use when exporting as Latex via the menu by modifying the jupyter_notebook_config.py file in your ~.jupyter directory. If this file doesn't exist already, you can generate it by running the command jupyter notebook --generate-config from the command line. I have my template sitting in the ~/.jupyter directory as well, so I added the following to my jupyter_notebook_config.py:
# Insert this at the top of the file to allow you to reference
# a template in the ~.jupyter directory
import os.path
import sys
sys.path.insert(0, os.path.expanduser("~") + '/.jupyter')
# Insert this at the bottom of the file:
c.LatexExporter.template_file = 'my_template' # no .tplx extension here
c.LatexExporter.template_path = ['.', os.path.expanduser("~") + '/.jupyter'] # nbconvert will look in ~/.jupyter
To understand a bit about how the templates work, start by taking a look at null.tplx. The line ((*- for cell in nb.cells -*)) loops over all the cells in the notebook. The if statements that follow check the type of each cell and call the appropriate block.
The other templates extend null.tplx. Each template defines (or redefines) some of the blocks. The hierarchy is null->display_priority->document_contents->base->style_*->article.
Your custom template should probably extend article.tplx and add some Latex commands to the header that sets up the tables the way you want. Take a look at this blog post for an example of setting up a custom template.

Any setting that change the table size to fit it in the width of the page?
Latex code is something like this: \resizebox*{\textwidth}{!}{%

Boxes are displayed instead of text in pdfkit - Python3

I am trying to convert an HTML file to pdf using pdfkit python library.
I followed the documentation from here.
Currently, I am trying to convert plain texts to PDF instead of whole html document. Everything is working fine but instead of text, I am seeing boxes in the generated PDF.
This is my code.
import pdfkit
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf/wkhtmltox/bin/wkhtmltopdf')
content = 'This is a paragraph which I am trying to convert to pdf.'
pdfkit.from_string(content,'test.pdf',configuration=config)
This is the output.
Instead of the text 'This is a paragraph which I am trying to convert to pdf.', converted PDF contains boxes.
Any help is appreciated.
Thank you :)

Unable to reproduce the issue with Python 2.7 on Ubuntu 16.04 and it works fine on the specs mentioned. From my understanding this problem is from your Operating System not having the font or encoding in which the file is being generated by the pdfkit.
Maybe try doing this:
import pdfkit
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf/wkhtmltox/bin/wkhtmltopdf')
content = 'This is a paragraph which I am trying to convert to pdf.'
options = {
'encoding':'utf-8',
}
pdfkit.from_string(content,'test.pdf',configuration=config, options=options)
The options to modify pdf can be added as dictionary and assigned to options argument in from_string functions. The list of options can be found here.

This issue is referred here Include custom fonts in AWS Lambda
if you are using pdfkit on lambda you will have to setup ENV variables as
"FONT_CONFIG_PATH": '/opt/fonts/'
"FONTCONFIG_FILE": '/opt/fonts/fonts.conf'
if this problem is in the local environment a fresh installation of wkhtmltopdf must resolve this

Using pandoc and pandoc-citeproc with Jupyter notebooks

I am developing an infrastructure where developers can document their verification tests using Jupyter notebooks. One part of the infrastructure will be a python script that can convert their .ipynb files to .html files to provide public-facing documentation of their tests.
Using the nbconvert module does most of what I want, but I would like to allow citations and references in the final HTML file. I can use pypandoc to generate HTML text that converts the citations to proper inlined syntax and adds a References section:
from urllib import urlopen
import nbformat
import pypandoc
from nbconvert import MarkdownExporter
response = urlopen('SimpleExample.ipynb').read().decode()
notebook = nbformat.reads(response, as_version=4)
exporter = MarkdownExporter()
(body, resources) = exporter.from_notebook_node(notebook)
filters = ['pandoc-citeproc']
extra_args = ['--bibliography="ref.bib"',
'--reference-links',
'--csl=MWR.csl']
new_body = pypandoc.convert_text(body,
'html',
'md',
filters=filters,
extra_args=extra_args)
The problem is that this generated HTML loses all of the considerable formatting and other capabilities provided by nbconvert.HTMLExporter.
My question is, is there a straightforward way to merge the results of nbconvert.HTMLExporter and pypandoc.convert_text() such that I get mostly the former, with inline citations and a Reference section added from the latter?

I don't know that this necessarily counts as "straightforward" but I was able to come up with a solution. It involves writing a class that inherits from nbconvert.preprocessors.Preprocessor and implements the preprocess(self, nb, resources) method. Here is what preprocess() does:
Loop over every cell in the notebook and store a set of citation keys (these are of the form [#bibtex_key]
Create a short body of text consisting of only these citation keys, each separated by '\n\n'
Use the pandoc conversion above to generate HTML text from this short body of text. If num_cite is the number of citations, the first num_cite lines of the generated text will be the inline versions of the citations (e.g. '(Author, Year)'); the remaining lines will be the content of the references section.
Go back through each cell and substitute the inline text of each citation for its key.
Add a cell to the notebook with ## References
Add a cell to the notebook with the content of the references section
Now, when an HTMLExporter, using this Preprocessor, converts a notebook, the results will have inline citations, a reference section, and all of the formatting you expect from the HTMLExporter.

Libre/Open and writer2Latex, call from Python

I came across this wonderful extension for Open/Libre Office, writer2Latex, which can convert my whole document into TeX format.It's working great so I was wondering if it is possible to skip open/libre office application and call writer2Latex directly from Python? I would like that my python application just calls writer2Latex with the word document as input, and get generated latex file.

This should be doable using Python + uno. The method is very similar to exporting to PDF, which is more common for people to do than export to TeX / Latex. LibreOffice / OpenOffice have a set of export filters that can just be changed.
See DocumentHacker.com for some general documentation about using Python + LibreOffice / OpenOffice, such as how to open documents. The recipe in the cookbook that you need to modify is "Converting to PDF". Simply replace the output filter with the latex filter name
#already have the document open
from com.sun.star.beans import PropertyValue
outputfiltername = "writer_pdf_Export" #for PDF
property = (
PropertyValue( "FilterName" , 0, outputfiltername , 0 ),
)
document.storeToURL("file:///home/my_username/output.pdf", property)
I'm not sure what the FilterName should be however, maybe you can work it out from the writer3Latex documentation. I found a PDF that suggests it should be:
outputfiltername = "org.openoffice.da.writer2latex"
but I haven't tested it (search for "FilterName").

How do you change the code example font size in LaTeX PDF output with Sphinx?

I find the default code example font in the PDF generated by Sphinx to be far too large.
I've tried getting my hands dirty in the generated .tex file inserting font size commands like \tiny above the code blocks, but it just makes the line above the code block tiny, not the code block itself.
I'm not sure what else to do - I'm an absolute beginner with LaTeX.

I worked it out. Pygments uses a \begin{Verbatim} block to denote code snippets, which uses the fancyvrb package. The documentation I found (warning: PDF) mentions a formatcom option for the verbatim block.
Pygments' latex writer source indicates an instance variable, verboptions, is stapled to the end of each verbatim block and Sphinx' latex bridge lets you replace the LatexFormatter.
At the top of my conf.py file, I added the following:
from sphinx.highlighting import PygmentsBridge
from pygments.formatters.latex import LatexFormatter
class CustomLatexFormatter(LatexFormatter):
def __init__(self, **options):
super(CustomLatexFormatter, self).__init__(**options)
self.verboptions = r"formatcom=\footnotesize"
PygmentsBridge.latex_formatter = CustomLatexFormatter
\footnotesize was my preference, but a list of sizes is available here

To change Latex Output options in sphinx, set the relevant latex_elements key in the build configuration file, documentation on this is located here.
To change the font size for all fonts use pointsize.
E.g.
latex_elements = {
'pointsize':'10pt'
}
To change other Latex settings that are listed in the documetntation use preamble or use a custom document class in latex_documents.
E.g.
mypreamble='''customlatexstuffgoeshere
'''
latex_elements = {
'papersize':'letterpaper',
'pointsize':'11pt',
'preamble':mypreamble
}
Reading the Sphinx sourcecode by default the code in LatexWriter sets code snippets to the \code latex primitive.
So what you want to do is replace the \code with a suitable replacement.
This is done by including a Latex command like \newcommand{\code}[1]{\texttt{\tiny{#1}}} either as part of the preamble or as part of a custom document class for sphinx that gets set in latex_documents as the documentclass key. An example sphinx document class is avaliable here.
Other than just making it smaller with \tiny you can modify the latex_documents document class or the latex_elements preamble to use the Latex package listings for more fancy code formatting like in the StackOverflow question here.
The package stuff from the linked post would go as a custom document class and the redefinition similar to \newcommand{\code}[1]{\begin{lstlisting} #1 \end{lstlisting}} would be part of the preamble.
Alternatively you could write a sphinx extension that extends the default latex writer with a custom latex writer of your choosing though that is significantly more effort.
Other relevant StackOverflow questions include
Creating Math Macros with Sphinx
How do I disable colors in LaTeX output generated from sphinx?
sphinx customization of latexpdf output?

You can add a modified Verbatim command into your PREAMBLE (Note in this case the font size is changed to tiny)
\renewcommand{\Verbatim}[1][1]{%
% list starts new par, but we don't want it to be set apart vertically
\bgroup\parskip=0pt%
\smallskip%
% The list environement is needed to control perfectly the vertical
% space.
\list{}{%
\setlength\parskip{0pt}%
\setlength\itemsep{0ex}%
\setlength\topsep{0ex}%
\setlength\partopsep{0pt}%
\setlength\leftmargin{10pt}%
}%
\item\MakeFramed {\FrameRestore}%
\tiny % <---------------- To be changed!
\OriginalVerbatim[#1]%
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.