Image ignored in PDF rendering with jinja2 and weasyprint

Image ignored in PDF rendering with jinja2 and weasyprint - python

I am running an analysis script in python that gives me some variables and an image that I save as a png file. I then use jinja2 to fill them into an html template and I use weasyprint to render the html into a PDF file that I save.
There are a number of questions on here that deal with this issue, but none of the suggested solutions have fixed my problem. I need to feed it the absolute path to the image, since the output data gets saved on a totally different part of my local disk than the code that generates it. Many of the offered solutions suggest to use something like request.base_url(), but that seems something to come from flask (I guess?) or something that is actually building an app as opposed to simply building a PDF file.
The function that generates the PDF file from the variables looks like this:
def create_pdf_report(varlist, outfile):
# Create Jinja environment and get template
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader('<absolute-path-to-dir-with-template>'))
template = env.get_template('report_template.html')
# Render HTML with input variables
html_out = template.render(varlist)
# Generate PDF
from weasyprint import HTML
HTML(string=html_out).write_pdf(outfile)
The template is this:
<img src="{{ systematics_figure }}" alt="systematics" width="900" height="650">
And the varlist dictionary I pass in looks like this:
fig3_fname = '<absolute-path-to-image>'
varlist = {'systematics_figure': fig3_fname}
While plain variables that I give it render fine, the image is not displayed; I get its alt text instead. The process finishes fine but I can't get the png image to be displayed. When I stick in the absolute image path into the html file directly and open it in my browser, it appears fine.
What can I do to make this work?
I am on a macOS Sierra 10.12.6 and working in python 3.7.3.

Turns out the solution was to add base_url='.' in weasyprint.HTML():
HTML(string=html_out, base_url='.').write_pdf(outfile)
and to use png files instead of pdf figures.

Related

Saving image of every python execution in a folder with a serial number as file name

I am a beginner in Python and would like to execute a code which saves an image into particular directory.
The thing is, I would like to save image with a serial number so that I can have many images(each execution gives one image) in the directory.
plt.savefig('Images/Imageplot.png') ## Image saved to particular folder

About the serial number, you can use the uuid library to generate a random and unique id for an image. See more here: https://www.geeksforgeeks.org/generating-random-ids-using-uuid-python/
To save images, it is a bit more complicated. It requires you to import the os library.
Here is an example:
import os
UPLOADED_FILE_DIR_PATH = os.path.join(os.path.dirname(__file__), "static", "uploaded-images")
my_file_path = os.path.join(UPLOADED_FILE_DIR_PATH, my_filename)
image_file.save(my_file_path)
This is a block of code I used previously for my website, so it may not apply for you, depending on your situation. I personnaly like that method, but if you are unsatisfied, take a look at this for more options: https://towardsdatascience.com/loading-and-saving-images-in-python-ba5a1f5058fb

python ghostscript not closing output file

I'm trying to turn PDF files with one or many pages into images for each page. This is very much like the question found here. In fact, I'm trying to use the code from #Idan Yacobi in that post to accomplish this. His code looks like this:
import ghostscript
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pdf2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
ghostscript.Ghostscript(*args)
When I run the code I get the following output from python:
##### 238647312 c_void_p(238647312L)
When I look at the folder where the new .jpg image is supposed to be created, there is a file there with the new name. However, when I attempt to open the file, the image preview says "Windows Photo Viewer can't open this picture because the picture is being edited in another program."
It seems that for some reason Ghostscript opened the file and wrote to it, but didn't close it after it was done. Is there any way I can force that to happen? Or, am I missing something else?
I already tried changing the last line above to the code below to explicitly close ghostscript after it was done.
GS = ghostscript.Ghostscript(*args)
GS.exit()

I was having the same problem where the image files were kept open but when I looked into the ghostscript init.py file (found in the following directory: PythonDirectory\Lib\site-packages\ghostscript__init__.py), the exit method has a line commented.
The gs.exit(self._instance) line is commented by default but when you uncomment the line, the image files are being closed.
def exit(self):
global __instance__
if self._initialized:
print '#####', self._instance.value, __instance__
if __instance__:
gs.exit(self._instance) # uncomment this line
self._instance = None
self._initialized = False

I was having this same problem while batching a large number of pdfs, and I believe I've isolated the problem to an issue with the python bindings for Ghostscript, in that like you said, the image file is not properly closed. To bypass this, I had to go to using an os system call. so given your example, the function and call would be replaced with:
os.system("gs -dNOPAUSE -sDEVICE=jpeg -r144 -sOutputFile=" + jpeg_output_path + ' ' + pdf_input_path)
You may need to change "gs" to "gswin32c" or "gswin64c" depending on your operating system. This may not be the most elegant solution, but it fixed the problem on my end.

My work around was actually just to install an image printer and have Python print the PDF using the image printer instead, thus creating the desired jpeg image. Here's the code I used:
import win32api
def pdf_to_jpg(pdf_path):
"""
Turn pdf into jpg image(s) using jpg printer
:param pdf_path: Path of the PDF file to be converted
"""
# print pdf to jpg using jpg printer
tempprinter = "ImagePrinter Pro"
printer = '"%s"' % tempprinter
win32api.ShellExecute(0, "printto", pdf_path, printer, ".", 0)

I was having the same problem when running into a password protected PDF - ghostscript would crash and not close the PDF preventing me from deleting the PDF.
Kishan's solution was already applied for me and therefore it wouldn't help my problem.
I fixed it by importing GhostscriptError and instantiating an empty Ghostscript before a try/finally block like so:
from ghostscript import GhostscriptError
from ghostscript import Ghostscript
...
# in my decryptPDF function
GS = Ghostscript()
try:
GS = Ghostscript(*args)
finally:
GS.exit()
...
# in my function that runs decryptPDF function
try:
if PDFencrypted(append_file_path):
decryptPDF(append_file_path)
except GhostscriptError:
remove(append_file_path)
# more code to log and handle the skipped file
...

For those that stumble upon this with the same problem. I looked through the python ghostscript init file and discovered the ghostscript.cleanup() function/def.
Therefore, I was able to solve the problem by adding this simple one-liner to the end of my script [or the end of the loop].
ghostscript.cleanup()
Hope it helps someone else because it frustrated me for quite a while.

Issue writing temp images to temp pdf in pyramid with reportlabs

I am using python 3, with pyramid and reportlabs to generate dynamic pdfs.
I am having a issue writing images in to a pdf. I am using Reportlab in a web to generate a pdf with images, by my images are not stored locally, they are on a remote server. I am downloading them locally into a temp directory ( they are saving, I have checked) When i go to add the images to the pdf, they space is allocating but image is not showing up.
Here is my relevant code (simplified):
# creates pdf in memory
doc = SimpleDocTemplate(pdfName, pagesize=A4)
elements = []
for item in model['items']:
# image goes here:
if item['IMAGENAME']:
response = getImageFromRemoteServer(item['IMAGENAME'])
dir_filename = directory + item['IMAGENAME']
if response.status_code == 200:
with open(dir_filename, 'wb') as f:
for chunk in response.iter_content():
f.write(chunk)
questions.append(Image(dir_filename, width=2*inch, height=2*inch))
# create and save the pdf
doc.build(elements,canvasmaker=NumberedCanvas)
I have followed the user guide here https://www.reportlab.com/docs/reportlab-userguide.pdf and have tried the above way, plus embedded images (as the user guide says in the paragraph section) and putting the image in the table.
I also looked here: and it did not help me.
My question is really, what is the right what to download an image and put in a pdf?
EDIT: fixed code indentation
EDIT 2:
Answered, I was finally about to get the images in the PDF. I am not sure what was the trigger to get it to work. The only thing that know I change was now I am using urllib to do the request and before i was not. Here is the my working code (simplified for the question only, this is more abstracted and encapsulated in my code.):
doc = SimpleDocTemplate(pdfName, pagesize=A4)
# array of elements in the pdf
elements = []
for question in model['questions']:
# image goes here:
if question['IMAGEFILE']:
filename = question['IMAGEFILE']
dir_filename = directory + filename
url = get_url(settings, filename)
response = urllib.request.urlopen(url)
raw_data = response.read()
f = open(dir_filename, 'wb')
f.write(raw_data)
f.close()
response.close()
myImage = Image(dir_filename)
myImage.drawHeight = 2* inch
myImage.drawWidth = 2* inch
myImage.hAlign = "LEFT"
elements.append(myImage)
# create and save the pdf
doc.build(elements)

Make your code independent from where the files come from. Separate file/resource retrieval from document generation. Ensure that your toolset is working with local files. Encapsulate the code to load files in a loader class or function. The encapsulation is what matters. Noticed this again this week while looking at thumbor loader classes.
If that works, you know reportlab, PIL and your application basically work.
Then make your code work with remote files using URI like http://path/to/remote/files.
Afterwards you can switch from using your fileloader or your httploader depending on environment or use case.
Another option to go would be to make your code work with local files using URI like file://path/to/file
This way the only thing that changes when switching from local to remote is the URL. Probably you need a python library supporting this. requests library is well suited for downloading things, most probably it supports URL scheme file:// as well.

Most probably the lazy parameter was responsible that your first code sample did not render the images. Triggering reportlab PDF rendering outside of the context managers for temporary files could have lead to this behaviour.
reportlab.platypus.flowables.py (using version 3.1.8)
class Image(Flowable):
"""an image (digital picture). Formats supported by PIL/Java 1.4 (the Python/Java Imaging Library
are supported. At the present time images as flowables are always centered horozontally
in the frame. We allow for two kinds of lazyness to allow for many images in a document
which could lead to file handle starvation.
lazy=1 don't open image until required.
lazy=2 open image when required then shut it.
"""
_fixedWidth = 1
_fixedHeight = 1
def __init__(self, filename, width=None, height=None, kind='direct', mask="auto", lazy=1):
"""If size to draw at not specified, get it from the image."""
self.hAlign = 'CENTER'
self._mask = mask
fp = hasattr(filename,'read')
if fp:
self._file = filename
self.filename = repr(filename)
...
The last three lines of the code example tell you that you can pass an object that has a read method. This is exactly what a call to urllib.request.urlopen(url) returns. Using that memory buffer you create an instance of Image. No need to have write access to filesystem, no need to delete these files after PDF rendering. Applying our new knowledge to improve code readability. Since your use-case includes retrieval of remote resources using memory buffers that support python file API could be a much cleaner approach to assemble your PDF files.
from contextlib import closing
import urllib.request
doc = SimpleDocTemplate(pdfName, pagesize=A4)
# array of elements in the pdf
elements = []
for question in model['questions']:
# download image and create Image from file-like object
if question['IMAGEFILE']:
filename = question['IMAGEFILE']
image_url = get_url(settings, filename)
with closing(urllib.request.urlopen(image_url)) as image_file:
myImage = Image(image_file, width=2*inch, height=2*inch)
myImage.hAlign = "LEFT"
elements.append(myImage)
# create and save the pdf
doc.build(elements)
References
Coding with context managers

ipython notebook - uploading from and saving to subdirectories?

Change IPython working directory
Inserting image into IPython notebook markdown
Hi, I've read the two above links, and the second link seems most relevant. what the person describes - simply calling the subdirectory - doesn't work for me. For instance, I have an image 'gephi.png' in '/Graphs/gephi.png'
But when I write the following
from IPython.display import Image
path = "/Graphs/gephi.png"
i = Image(path)
i
no image pops up - Yup. No error. Just nothing pops up besides an empty square box image.
Clarification:
When I move the image to the regular director, the image pops up fine.
My only code change is path = "gephi.png"

IPython's Image display object takes three kinds of arguments
The first is raw image data (e.g. the results of open(filename).read():
with open("Graphs/graph.png") as f:
data = f.read()
Image(data=data)
The second model is to load an image from a filename. This is functionally the same as above, but IPython does the reading from the file:
Image(filename="Graphs/graph.png")
The third form is passing URLs. External URLs can be used, but relative URIs will serve files relative to the notebook's own directory:
Image(url="Graphs/graph.png")
Where this can get confusing is if you don't tell IPython which one of these you are specifying, and you just pass the one argument positionally:
Image("Graphs/graph.png")
IPython tries to guess what you mean in this case:
if it looks like a path and points to an existing file, use it as a filename
if it looks like a URL, use it as a URL
otherwise, fallback on embedding the string as raw png data
That #3 is the source of the most confusion. If you pass it a filename that doesn't exist,
you will get a broken image:
Image("/Graphs/graph.png")
Note that URLs to local files must be relative. Absolute URLs will generally be wrong:
Image(url="/Graphs/graph.png")
An example notebook illustrating these things.

Convert SVG to PDF (svglib + reportlab not good enough)

I'm creating some SVGs in batches and need to convert those to a PDF document for printing. I've been trying to use svglib and its svg2rlg method but I've just discovered that it's absolutely appalling at preserving the vector graphics in my document. It can barely position text correctly.
My dynamically-generated SVG is well formed and I've tested svglib on the raw input to make sure it's not a problem I'm introducing.
So what are my options past svglib and ReportLab? It either has to be free or very cheap as we're already out of budget on the project this is part of. We can't afford the 1k/year fee for ReportLab Plus.
I'm using Python but at this stage, I'm happy as long as it runs on our Ubuntu server.
Edit: Tested Prince. Better but it's still ignoring half the document.

I use inkscape for this. In your django view do like:
from subprocess import Popen
x = Popen(['/usr/bin/inkscape', your_svg_input, \
'--export-pdf=%s' % your_pdf_output])
try:
waitForResponse(x)
except OSError, e:
return False
def waitForResponse(x):
out, err = x.communicate()
if x.returncode < 0:
r = "Popen returncode: " + str(x.returncode)
raise OSError(r)
You may need to pass as parameters to inkscape all the font files you refer to in your .svg, so keep that in mind if your text does not appear correctly on the .pdf output.

CairoSVG is the one I am using:
import cairosvg
cairosvg.svg2pdf(url='image.svg', write_to='image.pdf')

rst2pdf uses reportlab for generating PDFs. It can use inkscape and pdfrw for reading PDFs.
pdfrw itself has some examples that show reading PDFs and using reportlab to output.
Addressing the comment by Martin below (I can edit this answer, but do not have the reputation to comment on a comment on it...):
reportlab knows nothing about SVG files. Some tools, like svg2rlg, attempt to recreate an SVG image into a PDF by drawing them into the reportlab canvas. But you can do this a different way with pdfrw -- if you can use another tool to convert the SVG file into a PDF image, then pdfrw can take that converted PDF, and add it as a form XObject into the PDF that you are generating with reportlab. As far as reportlab is concerned, it is really no different than placing a JPEG image.
Some tools will do terrible things to your SVG files (rasterizing them, for example). In my experience, inkscape usually does a pretty good job, and leaves them in a vector format. You can even do this headless, e.g. "inkscape my.svg -A my.pdf".
The entire reason I wrote pdfrw in the first place was for this exact use-case -- being able to reuse vector images in new PDFs created by reportlab.

Just to let you know and for the future issue, I find a solution for this problem:
# I only install svg2rlg, not svglib (svg2rlg is inside svglib as well)
import svg2rlg
# Import of the canvas
from reportlab.pdfgen import canvas
# Import of the renderer (image part)
from reportlab.graphics import renderPDF
rlg = svg2rlg.svg2rlg("your_img.svg")
c = canvas.Canvas("example.pdf")
c.setTitle("my_title_we_dont_care")
# Generation of the first page
# You have a last option on this function,
# about the boundary but you can leave it as default.
renderPDF.draw(rlg, c, 80, 740 - rlg.height)
renderPDF.draw(rlg, c, 60, 540 - rlg.height)
c.showPage()
# Generation of the second page
renderPDF.draw(rlg, c, 50, 740 - rlg.height)
c.showPage()
# Save
c.save()
Enjoy a bit with the position (80, 740 - h), it is only the position.
If the code doesn't work, you can look at in the render's reportlab library.
You have a function in reportlab to create directly a pdf from your image:
renderPDF.drawToFile(rlg, "example.pdf", "title")
You can open it and read it. It is not very complicated. This code come from this function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Image ignored in PDF rendering with jinja2 and weasyprint - python

Turns out the solution was to add base_url='.' in weasyprint.HTML(): HTML(string=html_out, base_url='.').write_pdf(outfile) and to use png files instead of pdf figures.

Related

Saving image of every python execution in a folder with a serial number as file name

python ghostscript not closing output file

Issue writing temp images to temp pdf in pyramid with reportlabs

ipython notebook - uploading from and saving to subdirectories?

Convert SVG to PDF (svglib + reportlab not good enough)

Categories

Resources