I am looking for the best accurate tool for PDF in Python that works like Jinja does for HTML.
What are your suggestions?
As answered by jbochi, ReportLab is the foundation for almost all Python projects that generate PDF.
But for your needs you might want to check out Pisa / xhtml2pdf. You would generate your HTML with a Jinja template and then use Pisa to convert the HTML to PDF. Pisa is built on top of ReportLab.
Edit: another option I'd forgotten about is wkhtmltopdf
Have a look at ReportLab Toolkit.
You can use templates only with the commercial version, though.
There's now a new kid on the block called WeasyPrint.
I had exactly the same requirement as the OP. Unfortunately WeasyPrint wasn't a viable solution, because I needed very exact positioning and barcode support. After a few days of work I finished a reportlab XML wrapper with Jinja2 support.
The code can be found on GitHub
including an example XML wich generates the following PDF.
What about python/jinja to rst/html and html/rst to pdf using either rst2pdf or pandoc.
Both of these have worked well for me but. like plaes, I may try Weasyprint in the future.
What more accurate tool for PDF in Python that works like Jinja than Jinja itself?
You just have to make sure that the Jinja block, variable, and comment identification strings do not conflict with the LaTeX commands. Once you change the Jinja environment to mimic the LaTeX environment you're ready to go!
Here's a snippet that works out of the box:
Python Source: ./create_pdf.py
import os, jinja2
from jinja2 import Template
latex_jinja_env = jinja2.Environment(
block_start_string = '\BLOCK{',
block_end_string = '}',
variable_start_string = '\VAR{',
variable_end_string = '}',
comment_start_string = '\#{',
comment_end_string = '}',
line_statement_prefix = '%%',
line_comment_prefix = '%#',
trim_blocks = True,
autoescape = False,
loader = jinja2.FileSystemLoader(os.path.abspath('./latex/'))
)
template = latex_jinja_env.get_template('latex_template.tex')
# populate a dictionary with the variables of interest
template_vars = {}
template_vars['section_1'] = 'The Section 1 Title'
template_vars['section_2'] = 'The Section 2 Title'
# create a file and save the latex
output_file = open('./generated_latex.tex', 'w')
# pass the dictionary with variable names to the renderer
output_file.write( template.render( template_vars ) )
output_file.close()
Latex Template: ./latex/latex_template.tex
\documentclass{article}
\begin{document}
\section{Example}
An example document using \LaTeX, Python, and Jinja.
% This is a regular LaTeX comment
\section{\VAR{section_1}}
\begin{itemize}
\BLOCK{ for x in range(0,3) }
\item Counting: \VAR{x}
\BLOCK{ endfor }
\end{itemize}
\#{This is a long-form Jinja comment}
\BLOCK{ if subsection_1_1 }
\subsection{ The subsection }
This appears only if subsection_1_1 variable is passed to renderer.
\BLOCK{ endif }
%# This is a short-form Jinja comment
\section{\VAR{section_2}}
\begin{itemize}
%% for x in range(0,3)
\item Counting: \VAR{x}
%% endfor
\end{itemize}
\end{document}
Now simply call: $> python ./create_pdf.py
Resulting Latex Source: ./generated_latex.tex
\documentclass{article}
\begin{document}
\section{Example}
An example document using \LaTeX, Python, and Jinja.
% This is a regular LaTeX comment
\section{The Section 1 Title}
\begin{itemize}
\item Counting: 0
\item Counting: 1
\item Counting: 2
\end{itemize}
\section{The Section 2 Title}
\begin{itemize}
\item Counting: 0
\item Counting: 1
\item Counting: 2
\end{itemize}
\end{document}
Generated Pdf:
References:
About Django's syntax conflict with LaTex
About latex templates
About passing a dict to render_template
If you want to use existing PDF as template, without altering original document, you can use Dhek template editor, which allows to define area (bounds, name, type) in a separate template file.
Template is saved in JSON format so that it can be parsed in Python, to fill areas over PDF and generate the final document (e.g. with values from Web form).
See documentation at https://github.com/applicius/dhek .
[EDIT]
Initial answer was from the author of dhek.
I have used this tool and this is great if your form has not been generated in the usual way (it even works on PDF done from images).
After you downloaded, unzipped, and run DHEK (no install needed, it is portable), you can select areas and given them a name:
You can then save the "mapping" to JSON so you can get the positions and dimensions of the areas:
{
"pages": [
{
"areas": [
{
"name": "FirstName",
"x": 198.48648648648648,
"type": "text",
"y": 151.22779922779924,
"height": 15.75289575289574,
"width": 181.15830115830119
},
{
"name": "LastName",
"x": 195.33590733590734,
"type": "text",
"y": 176.43243243243245,
"height": 18.115830115830107,
"width": 185.0965250965251
}
]
}
],
"format": "dhek-1.0.13"
}
You can then use these positions with reportlab to create a PDF that contains the text:
from reportlab.pdfgen.canvas import Canvas
def write_text(
canvas: Canvas, txt: str, x: float, y: float, height: float, in_middle: bool = True
) -> None:
"""Write text in a form (in middle of height)"""
if canvas.bottomup:
y = canvas._pagesize[1] - y
canvas.drawString(x, y + height / 2, txt)
def create_overlay(overlay_path: str):
"""
Create the data that will be overlayed on top
of the form that we want to fill
"""
c = Canvas(overlay_path, bottomup=0) # DHEK has (0,0) at top-left
write_text(c, "Mike", 198.48648648648648, 151.22779922779924, 15.75289575289574)
write_text(c, "Jagger", 195.33590733590734, 176.43243243243245, 18.115830115830107)
c.save()
create_overlay("form_overlay.pdf")
You can then use any tool / pdf library (e.g. pdfrw) to merge the two in a single page:
import pdfrw
def merge_pdfs(form_pdf, overlay_pdf, output):
"""
Merge the specified fillable form PDF with the
overlay PDF and save the output
"""
form = pdfrw.PdfReader(form_pdf)
olay = pdfrw.PdfReader(overlay_pdf)
for form_page, overlay_page in zip(form.pages, olay.pages):
merge_obj = pdfrw.PageMerge()
overlay = merge_obj.add(overlay_page)[0]
pdfrw.PageMerge(form_page).add(overlay).render()
writer = pdfrw.PdfWriter()
writer.write(output, form)
merge_pdfs("form.pdf", "form_overlay.pdf", "form_filled.pdf")
(last part of code to create Overlay and Merge is coming from the Excellent blog "Mouse vs Python": https://www.blog.pythonlibrary.org/2018/05/22/filling-pdf-forms-with-python/)
... There is also the library pdfjinja that is for this purpose: https://github.com/rammie/pdfjinja
It is using annotations to create template values.
In my use case, I didn't have a PDF with proper form fields so the solution suggested by cchantep was more suitable.
Related
so I'm fetching data for a mini-blog from an endpoint and each post in the JSON bin has an image property and in it is a corresponding link to a background image from Unsplash.
As in:
[
{
"id": 1,
"title": "The Life of Cactus",
"body": "Nori grape silver...",
"date": "31-08-2022",
"author": "Bakes Parker",
"image": "https://images.unsplash.com/photo-1544674644-c8c919e3cfbf?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=755&q=80"
},
Inside my Flask application, I get respective posts and render them with Flask and use dynamic values with Jinja. Here's the function that gets the posts:
#app.route("/post<int:num>")
def return_post(num):
API_ENDPOINT = "https://api.npoint.io/xxx"
blog_stories = requests.get(API_ENDPOINT).json()
requested_story = None
for blog_post in blog_stories:
if blog_post['id'] == num:
requested_story = blog_post
return render_template("post.html", requested_post=requested_story)
Inside post.html, I try to make the background of the respective post with this line
<style>
header {
background-image: url("{{requested_post['image']}}");
}
</style>
but it doesn't show as it shows the default #cccc background-img color.
I also try to make it inline but it's not possible because of CSS curly brace restrains.
As in:
<header class="masthead" style="background-image:{{requested_post['image']}} ;">
I can't do the above.
So I found out that the images from the endpoint were actually rendering but they were overridden by properties from the stylesheet which is quite odd because, I defined the style inside the style tag.
Developer Tools shows that the image is there
Sorry for any inconvenience.
Nothing jumps out as wrong. Try importing current_app from flask, then printing out your json to debug. Perhaps you are not getting the data you are expecting.
from flask import current_app
#app.route("/post<int:num>")
def return_post(num):
API_ENDPOINT = "https://api.npoint.io/xxx"
blog_stories = requests.get(API_ENDPOINT).json()
for requested_post in blog_stories:
if requested_post.get('id') == num:
current_app.logger.debug(requested_post)
return render_template("post.html", requested_post=requested_post)
return render_template("error.html")
I suggest you think about redirecting to an error template or route here.
Can you post your complete template? There may be clues there.
I'm new to pelican and I recently made my personal blogs website
My website link : hosted on github pages
Source code : on github
website is hosted on gh-pages branch with the help of ghp-import
Here I want to generate custom category welcome/homepage for each category. I don't want to display all the articles of that specific category. I tried this by creating index.html file in the each specific folder of category but it didn't worked. Also searched on SO and found some related answers but I cannot figure out what to do in my case
Here are those questions (Mentioning it cos maybe they are helpful) :
How do I choose a category page to be the home page for a Pelican site?
How to customize individual category pages in Pelican
Introduction pages for categories in Pelican
What I want?
For example, In Shah-Aayush.github.io/content/notes/ category, I don't want to display all the pages reside in this notes category and giving more button to expand and see each category. instead I want to display custom introduction page which is index.md. so when I click on notes category on my website it opens the default generated https://shah-aayush.github.io/category/notes.html page but I want to generate https://shah-aayush.github.io/my-notes.html page which is index.md.
Another example : in this profile category, I want to display the contents of profiles.md not displaying all contents reside in profile folder/category the 10ff.md, spotify.md, profiles.md and giving more button to expand each.
so what does it display now when I click on profile : https://shah-aayush.github.io/category/profiles.html
What I want it do display when I click profile category : https://shah-aayush.github.io/my-profiles.html
How to achieve this? Thanks in advance :)
I do not fully understand your question. The Pelican metadata has a field called save_as on the documentation page. I have used this feature on my own website.
https://docs.getpelican.com/en/latest/content.html
On your index.md file, you can have a metadata line like this,
Title: My super title
save_as: my-notes.html
Date: 2010-12-03 10:20
Modified: 2010-12-05 19:30
Category: Python
Tags: pelican, publishing
Slug: my-super-post
Authors: Alexis Metaireau, Conan Doyle
Summary: Short version for index and feeds
The second line will direct Pelican to save the output as my-notes.html.
I have a somewhat messy, but working solution. Here is the relevant part of my configuration file.
import os
from jinja2 import Environment, ChoiceLoader, FileSystemLoader
from markdown import Markdown
MARKDOWN = {
"extension_configs": {
"markdown.extensions.attr_list": {},
"markdown.extensions.def_list": {},
"markdown.extensions.footnotes": {},
"markdown.extensions.md_in_html": {},
"markdown.extensions.meta": {},
"markdown.extensions.nl2br": {},
"markdown.extensions.tables": {},
"markdown.extensions.toc": {"title": "", "toc_depth": "2-6"},
"pymdownx.emoji": {"title": "long"},
"pymdownx.caret": {},
"pymdownx.tilde": {},
"pymdownx.mark": {},
"pymdownx.tasklist": {},
"pymdownx.details": {},
"pymdownx.superfences": {"preserve_tabs": True},
"pymdownx.highlight": {"css_class": "highlight", "use_pygments": True, "noclasses": True, "pygments_style": "Dani_dark"},
"customblocks": {"generators": {"question": "customblocks.custom_generators:question"}},"linkcard": {}}},
"tablespan": {},
"mdx_include_lines": {"base_path": "content/code", "encoding": "utf-8", "line_nums": False},
"markdown_include.include": {"encoding": "utf-8"},
},
"output_format": "html5"
}
md_extensions = [e for e in MARKDOWN["extension_configs"].keys()]
md = Markdown(extensions=md_extensions, extension_configs=MARKDOWN["extension_configs"], output_format="html5")
# of course, you should define the extensions according to your needs, the example just shows how I do it
def category_details(cat_name):
if not cat_name in CATEGORY_DETAILS.keys():
return ""
cat_details_obj = CATEGORY_DETAILS[cat_name]
if cat_details_obj.long_description != "":
env = Environment(loader=ChoiceLoader([FileSystemLoader(os.path.join("themes", "Elegant", "templates"))]), **JINJA_ENVIRONMENT)
jinja_processed = env.from_string(cat_details_obj.long_description).render()
return md.convert(jinja_processed)
else:
return cat_details_obj.short_description
JINJA_ENVIRONMENT = {"extensions": ["jinja2.ext.do", "jinja2.ext.i18n"]}
JINJA_FILTERS = {
"cat_details": category_details,
}
CATEGORY_DETAILS_PATH = "cat_details"
class CategoryDetails:
name = ""
short_description = ""
long_description = ""
def __init__(self, name, short_description="", desc_filename=""):
self.name = name
self.short_description = short_description
desc_file_path = os.path.join(PATH, CATEGORY_DETAILS_PATH, desc_filename)
if not desc_filename in ["", " ", None] and os.path.isfile(desc_file_path):
with open(desc_file_path, "r", encoding="utf-8") as f:
self.long_description = f.read()
else:
self.long_description = self.short_description
CATEGORY_DETAILS = {
"Books": CategoryDetails("Books", "An awesome list of books to read and enjoy :)", "Books.txt"),
"Katalonski - Català": CategoryDetails("Katalonski - Català", "Una llengua boniquíssima! :heart:"),
"Francés/DuoLingo": CategoryDetails("Francés/DuoLingo", "Notas, vocabularios y más cosas del curso de francés en DuoLingo", "francés_DuoLingo.txt"),
}
As you guess, the next step is to create the cat_details folder and a .txt for every category, add the category to the CATEGORY_DETAILS dictionary, and then use the created filter in your template, for example in index.html:
{% if category %}
{{ category.name|cat_details|safe }}<hr />
{% endif %}
The only possible inconvenient is that this doesn't remove the articles list. The articles from the category will appear after the category details. To solve this, you may tweak your template. :)
I said it's messy... :)
There is, undouptedly, a better solution. And yes, this was made using folder structure for categories, but it should work with metadata, too.
I am a very beginner in programmation and Python. I have a map application built with dash-leaflet with several (~10) GeoJSON files included by dl.GeoJSON component. I would like to show a popup with all the properties of each file. Before dl.GeoJSON was implemented, i used to create my layers by reading my geojson and defining popup like this :
def compute_geojson(gjson):
geojson = json.load(open(gjson["path"],encoding='utf8'))
if 'Polygon' in geojson["features"][0]["geometry"]["type"]:
data = [
dl.Polygon(
positions=get_geom(feat),
children=[
dl.Popup([html.P(k + " : " + str(v)) for k,v in feat["properties"].items()],maxHeight=300),
],
color=get_color(gjson,feat), weight=0.2, fillOpacity=gjson["opacity"], stroke=True
) for feat in geojson['features']
]
...
I would like to do this for all my geojson (which have different structures) with the component dl.GeoJSON because it should render faster than my method. Is it possible ? I tried some javascript with onEachFeature but didn't succeed.
Thanks
The simplest solution would be to add a feature named popup with the desired popup content, as the GeoJSON component will render it as a popup automatically,
import dash_leaflet as dl
import dash_leaflet.express as dlx
data = dlx.dicts_to_geojson([dict(lat=-37.8, lon=175.6, popup="I am a popup")])
geojson = dl.GeoJSON(data=data)
If you need more customization options and/or prefer not to add properties (e.g. for performance reasons), you would need to implement a custom onEachFeature function. If you create a .js file in your assets folder with content like,
window.someNamespace = Object.assign({}, window.someNamespace, {
someSubNamespace: {
bindPopup: function(feature, layer) {
const props = feature.properties;
delete props.cluster;
layer.bindPopup(JSON.stringify(props))
}
}
});
you can bind the function like this,
import dash_leaflet as dl
from dash_extensions.javascript import Namespace
ns = Namespace("someNamespace", "someSubNamespace")
geojson = dl.GeoJSON(data=data, options=dict(onEachFeature=ns("bindPopup")))
In the above code examples i am using dash-leaflet==0.1.10 and dash-extensions==0.0.33.
I have a PDF form created using Adobe LiveCycle Designer ES 10.4. I need to fill it using Python so that we can reduce manual labor. I searched the web and read some article most of them were focused around pdfrw library, I tried using it and extracted some information from PDF form as shown below
Code
from pdfrw import PdfReader
pdf = PdfReader('sample.pdf')
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))
Output
['/Root', '/Info', '/ID', '/Size']
{'/CreationDate': "(D:20180822164509+05'30')", '/Creator': '(Adobe LiveCycle Designer ES 10.4)', '/ModDate': "(D:20180822165611+05'30')", '/Producer': '(Adobe XML Form Module Library)'}
['/AcroForm', '/MarkInfo', '/Metadata', '/Names', '/NeedsRendering', '/Pages', '/Perms', '/StructTreeRoot', '/Type']
PDF has 1 pages
I am not sure how further I can use pdfrw to access the fillable fields from the PDF form and fill them using Python is it possible. Any suggestions would be helpful.
You can find the form fields here:
pdf.Root.AcroForm.Fields
or here
pdf.Root.Pages.Kids[page_index].Annots
This is a PdfArray object. Basically a List.
The Name of the field is found here:
pdf.Root.AcroForm.Fields[field_index].T
Other keys include the value .V
There's a bunch of display information, like the font etc under .AP.N.Resources
However, if you update the value for a field and output the pdf file. It might only display the value when the field has focus i.e is clicked on.
I haven't figured out how to fix that yet.
I wrote a library built upon:'pdfrw', 'pdf2image', 'Pillow', 'PyPDF2' called fillpdf (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)
Basic usage:
from fillpdf import fillpdfs
fillpdfs.get_form_fields("blank.pdf")
# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)
data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}
fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)
# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')
More info here:
https://github.com/t-houssian/fillpdf
If some fields don't fill, you can use fitz (pip install PyMuPDF) and PyPDF2 (pip install PyPDF2) like the following altering the points as needed:
import fitz
from PyPDF2 import PdfFileReader
file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()
# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')
# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')
Use this to fill every fields if they are indexed.
template = PdfReader('template.pdf')
page_c = 0
while page_c < len(template.Root.Pages.Kids): #LOOP through pages
annot_c = 0
while annot_c < len(template.Root.Pages.Kids[page_c].Annots): #LOOP through fields
template.Root.Pages.Kids[page_c].Annots[annot_c].update(PdfDict(V=str(annot_c)+'-'+str(page_c)))
annot_c=annot_c+1
page_c=page_c+1
PdfWriter().write('output.pdf', template)
AcroForm based Forms using PDFix SDK
def SetFormFieldValue(email, key, open_path, save_path):
pdfix = GetPdfix()
if pdfix is None:
raise Exception('Pdfix Initialization fail')
if not pdfix.Authorize(pdfix_email, pdfix_license):
raise Exception('Authorization fail : ' + pdfix.GetError())
doc = pdfix.OpenDoc(open_path, "")
if doc is None:
raise Exception('Unable to open pdf : ' + pdfix.GetError())
field = doc.GetFormFieldByName("Text1")
if field is not None:
value = field.GetValue()
value = "New Value"
field.SetValue(value)
if not doc.Save(save_path, kSaveFull):
raise Exception(pdfix.GetError())
doc.Close()
pdfix.Destroy()
A full solution was provided here: How to edit editable pdf using the pdfrw library?
The key part is the:
template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
I have a simple flask function that renders a template with a valid GeoJSON string:
#app.route('/json', methods=['POST'])
def json():
polygon = Polygon([[[0,1],[1,0],[0,0],[0,1]]])
return render_template('json.html',string=polygon)
In my json.html file, I am attempting to render this GeoJSON with OpenLayers:
function init(){
map = new OpenLayers.Map( 'map' );
layer = new OpenLayers.Layer.WMS( "OpenLayers WMS",
"http://vmap0.tiles.osgeo.org/wms/vmap0",
{layers: 'basic'} );
map.addLayer(layer);
map.setCenter(new OpenLayers.LonLat(lon, lat), zoom);
var fc = {{string}}; //Here is the JSON string
var geojson_format = new OpenLayers.Format.GeoJSON();
var vector_layer = new OpenLayers.Layer.Vector();
map.addLayer(vector_layer);
vector_layer.addFeatures(geojson_format.read(fc));
But this fails and the " characters become '. I have tried string formatting as seen in this question, but it didn't work.
EDIT:
I did forget to dump my json to an actual string, I'm using the geojson library so adding the function
dumps(polygon)
takes care of that, however I still can't parse the GeoJSON in OpenLayers, even though it is a valid string according to geojsonlint.com
This is the Javascript code to create a variable from the string sent from flask:
var geoJson = '{{string}}';
And here's what it looks like in the source page:
'{"type": "Polygon", "coordinates": [[[22.739485934746977, 39.26596659794341], [22.73902517923571, 39.266115931275074], [22.738329551588276, 39.26493626464484], [22.738796023230854, 39.26477459496181], [22.739485934746977, 39.26596659794341]]]}';
I am still having a problem rendering the quote characters.
Look like you use shapely which has http://toblerity.org/shapely/shapely.geometry.html#shapely.geometry.mapping method to create GeoJSON-like object.
To render json use tojson filter which safe (see safe filter) for latest flask versions, because jinja2 in flask by default escape all dangerous symbols to protect XSS.