HTML to PDF conversion in app engine python - python

My website has a lot of dynamically generated HTML content and I would like to give my website users a way to save the data in PDF format. Any ideas on how it can be done? I tried xhtml2pdf library but I couldn't get it to work. I even tried reportlibrary but we have to enter the PDF details manually. Is there any library which converts HTML content to PDF and works on app engine?

You need to copy all dependencies into your GAE project folder:
html5lib
reportlab
six
xhtml2pdf
Then you can use it like this:
from xhtml2pdf import pisa
from cStringIO import StringIO
content = StringIO('html goes here')
output = StringIO()
pisa.log.setLevel('DEBUG')
pdf = pisa.CreatePDF(content, output, encoding='utf-8')
pdf_data = pdf.dest.getvalue()
Some useful info that I googled just for you:
http://www.prahladyeri.com/2013/11/how-to-generate-pdf-in-python-for-google-app-engine/
https://github.com/danimajo/pineapple_pdf

Related

Can't convert HTML to PDF using pdfkit (empty file)

I want to convert a HTML file to PDF
Example Input
The HTML file from Google Drive looks good, can see content in my browser.
Code
import pdfkit # convert html to pdf
pdfkit.from_file('income_contract_01.html', 'res.pdf')
Issue
I expect to see data in res.pdf. But instead I see an empty file:
no text
but the same number of lines
Environment and versions used:
python 3.9
pdfkit versiion 1.0.0
wkhtmltopdf is the newest version (0.12.2.4-1)
OS Ubuntu 16.04
How can I fix the error? Don't see any error messages.
Update: I was trying to specify library in configuration but it didn't help
import pdfkit # convert html to pdf v1
path = "/usr/bin/wkhtmltopdf"
config = pdfkit.configuration(wkhtmltopdf=path)
pdfkit.from_file('income_contract_01.html', 'res123.pdf', configuration=config)
I solved it with using wkhtmltopdf.
Once you download it from here then put wkhtmltopdf.exe's path to variable in code below which is path_wkhtmltopdf.
import pdfkit
path_wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)
pdfkit.from_url("income_contract.html", "res.pdf", configuration=config)
Result: res.pdf

How to download Flickr images using photos url (does not contain .jpg, .png, etc.) using Python

I want to download image from Flickr using following type of links using Python:
https://www.flickr.com/photos/66176388#N00/2172469872/
https://www.flickr.com/photos/clairity/798067744/
This data is obtained from xml file given at https://snap.stanford.edu/data/web-flickr.html
Is there any Python script or way to download images automatically.
Thanks.
I try to find answer from other sources and compiled the answer as follows:
import re
from urllib import request
def download(url, save_name):
html = request.urlopen(url).read()
html=html.decode('utf-8')
img_url = re.findall(r'https:[^" \\:]*_b\.jpg', html)[0]
print(img_url)
with open(save_name, "wb") as fp:
fp.write(request.urlopen(img_url).read())
download('https://www.flickr.com/photos/clairity/798067744/sizes/l/', 'image.jpg')

weasyprint does not load flask images

Hello friends I have a code to generate pdf with weasyprint from flask and it turns out that it works and generates the pdf but the html that I am rendering has two images and it does not render them for me I have the following code:
from weasyprint import HTML
image = "/static/images/logo-ori.png"
name = "Pepito perez"
print_html = render_template('cert/certificado.html', img=image)
Up to this point the image renders sim problem but when I go to convert it to pdf it no longer renders it
cert = os.path.join(url_direct, "certificado.pdf")
pdf = HTML(string=print_html)
pdf.write_pdf(cert)
The pdf is stored on the server which is what I want to do and with flask I redirect to start I hope they can help me I was reviewing how to use flask-weasyprint but I cannot store the pdf on the server I hope they can help me with either of the two. Thank you in advance Thank you

Python Tika cannot parse pdf from url

python for parsing the online pdf for future usage. My code are below.
from tika import parser
import requests
import io
url = 'https://www.whitehouse.gov/wp-content/uploads/2017/12/NSS-Final-12-18-2017-0905.pdf'
response = requests.get(url)
with io.BytesIO(response.content) as open_pdf_file:
pdfFile = parser.from_file(open_pdf_file)
print(pdfFile)
However, it shows
AttributeError: '_io.BytesIO' object has no attribute 'decode'
I have taken an example from How can i read a PDF file from inline raw_bytes (not from file)?
In the example, it is using PyPDF2. But I need to use Tika as Tika has a better result than PyPDF2.
Thank you for helping
In order to use tika you will need to have JAVA 8 installed. The code that you'll need to retrieve and print contents of a pdf is as follows:
from tika import parser
url = 'https://www.whitehouse.gov/wp-content/uploads/2017/12/NSS-Final-12-18-2017-0905.pdf'
pdfFile = parser.from_file(url)
print(pdfFile["content"])

Python script to save webpage and rename it while saving (save as - command)

Hi I searched a lot and ended up with no relevant results on how to save a webpage using python 2.6 and renaming it while saving.
Better user requests libraty:
import requests
pagelink = "http://www.example.com"
page = requests.get(pagelink)
with open('/path/to/file/example.html', "w") as file:
file.write(page.text)
You may want to use the urllib(2) package to access the webpage, and then save the file object to the desired location (os.path).
It should look something like this:
import urllib2, os
pagelink = "http://www.example.com"
page = urllib2.urlopen(pagelink)
with open(os.path.join('/(full)path/to/Documents',pagelink), "w") as file:
file.write(page)

Categories

Resources