jinja template engine with python dataframe group by problem - python

I m using a combination of jinja2 template engine + python pandas to generate pdf using pdfkit. PDF is getting generated without errors. But I want to group data by studentId. Please find the expected output below:
Please find the python code below:
import pandas as pd
import numpy as np
from jinja2 import Environment, FileSystemLoader
import pdfkit
from datetime import datetime
env = Environment(loader=FileSystemLoader('/Users/macuser/Downloads/pdfkit'))
template = env.get_template("report.html")
df = pd.read_csv('/Users/macuser/Downloads/pdfkit/students.csv')
template_vars = {
"hData": df.to_html(),
}
html_out = template.render(template_vars)
options = {
'orientation': 'Landscape',
'margin-top': '0.25in',
'margin-right': '0.25in',
'margin-bottom': '0.25in',
'margin-left': '0.25in'
}
pdfkit.from_string(html_out, '/Users/macuser/Downloads/pdfkit/studentReport.pdf', options=options)
The html template to replace is very simple:
<html>
<head></head>
<body>
<h3>Students Report</h3>
{{ hData }}
</body>
</html>
This is the final output which I get.
How can I group by student id as mentioned in the screen shot above ? Also how can we remove extra index column(numbers) from the python dataframe ?
thanks

Related

Webscraping and fastapi issues

I'm trying to learn about scraping and I found an article about making an API by scraping information from a site then using fastapi to serve it. It's a simple project but the way it was written on the blog page makes it really confusing.
I'm thinking they left stuff out but I don't know what that would be. Here's the link to the article: https://www.scien.cx/2022/04/26/creating-a-skyrim-api-with-python-and-webscraping/
Here is the code that I'm trying to run. First I'm in a directory I named skypi. I made a file called sky-scrape.py. Here is the code:
from bs4 import BeautifulSoup
import requests
import json
def getLinkData(link):
return requests.get(link).content
factions = getLinkData("https://elderscrolls.fandom.com/wiki/Factions_(Skyrim)")
data = []
soup = BeautifulSoup(factions, 'html.parser')
table = soup.find_all('table', attrs={'class': 'wikitable'})
for wikiTable in table:
table_body = wikiTable.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
# Get rid of empty values
data.append([ele for ele in cols if ele])
cleanData = list(filter(lambda x: x != [], data))
skyrim_data[html] = cleanData *This doesn't work it throws errors saying skyrim_data not defined. If I just write it as
skyrim_data= cleanData
then here's the biggest issue: I have another file run.py and I want to import the data from sky-scrape.py
here is that file:
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from sky-scrape import skyrim_data
app = FastAPI()
#app.get("/", response_class=HTMLResponse)
def home():
return("""
<html>
<head>
<title>Skyrim API</title>
</head>
<body>
<h1>API DO SKYRIM</h1>
<h2>Rotas disponíveis:</h2>
<ul>
<li>/factions</li>
</ul>
</body>
</html>
""")
#app.get("/factions")
def factions():
return skyrim_data["Factions"]
The from sky-scrape import skyrim_data doesn't work so I'm not sure what to do at this point. How do I get this script to work correctly?

Output HTML using a template and JSON for data

What's a good way for me to output (AMP compliant) HTML using a template and JSON for the data? I have a nice little python script:
import requests
import json
from bs4 import BeautifulSoup
url = requests.get('https://www.perfectimprints.com/custom-promos/20492/Beach-Balls.html')
source = BeautifulSoup(url.text, 'html.parser')
products = source.find_all('div', class_="product_wrapper")
infos = source.find_all('div', class_="categories_wrapper")
def get_category_information(category):
category_name = category.find('h1', class_="category_head_name").text
return {
"category_name": category_name.strip()
}
category_information = [get_category_information(info) for info in infos]
with open("category_info.json", "w") as write_file:
json.dump(category_information, write_file)
def get_product_details(product):
product_name = product.find('div', class_="product_name").a.text
sku = product.find('div', class_="product_sku").text
product_link = product.find('div', class_="product_image_wrapper").find("a")["href"]
src = product.find('div', class_="product_image_wrapper").find('a').find("img")["src"]
return {
"title": product_name,
"link": product_link.strip(),
"sku": sku.strip(),
"src": src.strip()
}
all_products = [get_product_details(product) for product in products]
with open("products.json", "w") as write_file:
json.dump({'items': all_products}, write_file)
print("Success")
Which generates the JSON files I need. However, I now need to use those JSON files and input it into my template (gist) everywhere it says {{ JSON DATA HERE }}.
I'm not even sure where to start. I'm most comfortable with JavaScript so I'd like to use that if possible. I figure something involving Node.js.
Here's how you can render HTML with a template engine by itself and use it to return pure HTML:
from jinja2 import Template
me = Template('<h1>Hello {{x}}<h1>')
html = me.render({'x': 'world'})
return html
html is your rendered HTML string. Alternatively, you can render from a file:
from jinja2 import Template
with open('your_template.html') as file_:
template = Template(file_.read())
html = template.render(your_dict)
return html
Since you're going to generate HTML one way or another, using a template engine will save you much time. You can also do {% for item in list %} and such thing will greatly simplify your task.

How to embed images to pandas DataFrame and show in flask webpage

I want to show a table with a column of images and rest columns are text, to Flask pages. I can display table with images in jupyter notebook. But I cannot export as html code that can be embedded to flask to show the images. Instead, I saw just text <img src="http://url.to.image.png"/>.
import pandas as pd
from IPython.display import Image, HTML
df['IMAGE'] = df['IMGLINK'].apply(lambda x: '<img src="{}"/>'.format(x) if x else '')
pd.set_option('display.max_colwidth', -1)
HTML(df.to_html(escape=False))
In my Flask app.py code, I have the following:
#app.route('/result', methods=['POST'])
def result():
pd.set_option('display.max_colwidth',-1)
df = pd.read_csv('DATA_1.csv')
df['IMAGE'] = df['IMGLINK'].apply(lambda x: '<img src="{}"/>'.format(x) if x else '')
df_html = dfresult.to_html(index=False)#line_width=60, col_space=70
return render_template('result.html', datatable=df_html)
In my result.html, I have line {{ datatable | safe }}.
Image is not rendering, because the markup is encoded as text, not as html
If you looking to render image from dataframe, you need to use Markup().unescape() method.
html = Markup(dataframeName.to_html(classes='data')).unescape()
This will convert &lt to < and likewise other symbols to tags.
If you want to see, why this is happening, you can F12 in chrome, select tag and right click > edit as html, you will see that your image tag is looking something like:
<img src='url_path'>
however, it should look like: <img src='url_path'>

Getting <script> and <div> tags from Plotly using Python

I was wondering if anyone knew a good way (preferably a built in method, but I'm open to writing my own of course) to get the <script> and <div> tags from the HTML output of the Plotly offline client.
I'm already familiar with bokeh and really enjoy using it for 2D visualization, but would really like to integrate Plotly as well for its 3D visualization capabilities.
Let me know if you need any extra details about the project.
If you call:
plotly.offline.plot(data, filename='file.html')
It creates a file named file.html and opens it up in your web browser. However, if you do:
plotly.offline.plot(data, include_plotlyjs=False, output_type='div')
the call will return a string with only the div required to create the chart, which you can store in whatever variable you desire (and not to disk).
I just tried it and it returned, for a given chart that I was doing:
<div id="82072c0d-ba8d-4e86-b000-0892be065ca8" style="height: 100%; width: 100%;" class="plotly-graph-div"></div>
<script type="text/javascript">window.PLOTLYENV=window.PLOTLYENV || {};window.PLOTLYENV.BASE_URL="https://plot.ly";Plotly.newPlot("82072c0d-ba8d-4e86-b000-0892be065ca8",
[{"y": ..bunch of data..., "x": ..lots of data.., {"showlegend": true, "title": "the title", "xaxis": {"zeroline": true, "showline": true},
"yaxis": {"zeroline": true, "showline": true, "range": [0, 22.63852380952382]}}, {"linkText": "Export to plot.ly", "showLink": true})</script>
Notice how its just a tiny portion of an html that you are supposed to embed in a bigger page. For that I use a standard template engine like Jinga2.
With this you can create one html page with several charts arranged the way you want, and even return it as a server response to an ajax call, pretty sweet.
Update:
Remember that you'll need to include the plotly js file for all these charts to work.
You could include <script src="https://cdn.plot.ly/plotly-latest.min.js"></script> just before putting the div you got. If you put this js at the bottom of the page, the charts won't work.
With Plotly 4, use plotly.io.to_html:
import plotly
# Returns a `<div>` and `<script>`
plotly.io.to_html(figure, include_plotlyjs=False, full_html=False)
# Returns a full standalone HTML
plotly.io.to_html(figure)
Reference: https://plotly.com/python-api-reference/generated/plotly.io.to_html.html
Apologies for the necro-answer really wanted to add a comment to Fermin Silva left behind (https://stackoverflow.com/a/38033016/2805700) - but long standing lurker reputation prevents me.
Anyhow I had a similar need and encoutered an issue with plotly 2.2.2
plotly.offline.plot(data, include_plotlyjs=False, output_type='div')
The include_plotlyjs parameter was being ignored when outputting to a div. Based on the comments above, I found a workaround. Basically let plotly plot to file, which does respect the include_plotlyjs parameter. Load into beautiful soup and inject the link to the latest plotly.js on the cdn.
import plotly
import bs4
# return as html fragment
# the include_plotlyjs argument seems to be
# ignored as it's included regardless when outputting to div
# found an open issue on here - https://github.com/plotly/plotly.py/issues/1043
plotly.offline.plot(
plot_output,
filename = filename,
config = plot_config,
include_plotlyjs = False,
auto_open = False,
)
# load the file
with open(filename) as inf:
txt = inf.read()
soup = bs4.BeautifulSoup(txt)
# add in the latest plot-ly js as per https://stackoverflow.com/a/38033016/2805700
js_src = soup.new_tag("script", src="https://cdn.plot.ly/plotly-latest.min.js")
# insert it into the document
soup.head.insert(0, js_src)
# save the file again
with open(filename, "w") as outf:
outf.write(str(soup))
Cheers

pdfkit headers and footers

I've been searching the web for examples of people using the pdfkit (python wrapper) in implementing headers and footers and could not find any examples.
Would anyone be able to show some examples of how to implement the options in wkhtmltopdf using the pdfkit python wrapper?
I'm using it only with headers but I think that it will work the same with footers.
You need to have separate html file for the header.
header.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
Code of your header goes here.
</body>
</html>
Then you can use it like that in Python
import pdfkit
pdfkit.from_file('path/to/your/file.html', 'out.pdf', {
'--header-html': 'path/to/header.html'
})
The tricky part if you use some backend like Django and want to use templates is that you can't pass the header html as rendered string. You need to have a file.
This is what I do to render PDFs with Django.
import os
import tempfile
import pdfkit
from django.template.loader import render_to_string
def render_pdf(template, context, output, header_template=None):
"""
Simple function for easy printing of pdfs from django templates
Header template can also be set
"""
html = render_to_string(template, context)
options = {
'--load-error-handling': 'skip',
}
try:
if header_template:
with tempfile.NamedTemporaryFile(suffix='.html', delete=False) as header_html:
options['header-html'] = header_html.name
header_html.write(render_to_string(header_template, context).encode('utf-8'))
return pdfkit.from_string(html, output, options=options)
finally:
# Ensure temporary file is deleted after finishing work
if header_template:
os.remove(options['header-html'])
In my example I create temporary file where I put rendered content. Important part is that temporary file need to end with .html and to be deleted manually.
Improving #V Stoykov answer as it helped me using Flask, the render function with custom header in Flask will be as follows:
import os
import tempfile
import pdfkit
from flask import render_template, make_response
#app.route('/generate_pdf')
def render_pdf_custom_header(foo, bar):
main_content = render_template('main_pdf.html', foo=foo)
options = {
'--encoding': "utf-8"
}
add_pdf_header(options, bar)
add_pdf_footer(options)
try:
pdf = pdfkit.from_string(main_content, False, options=options)
finally:
os.remove(options['--header-html'])
os.remove(options['--footer-html'])
response = build_response(pdf)
return response
def add_pdf_header(options, bar):
with tempfile.NamedTemporaryFile(suffix='.html', delete=False) as header:
options['--header-html'] = header.name
header.write(
render_template('header.html', bar=bar).encode('utf-8')
)
return
def add_pdf_footer(options):
# same behaviour as add_pdf_header but without passing any variable
return
def build_response(pdf):
response = make_response(pdf)
response.headers['Content-Type'] = 'application/pdf'
filename = 'pdf-from-html.pdf'
response.headers['Content-Disposition'] = ('attachment; filename=' + filename)
return response
Notice that I used the '--header-html' and '--footer-html' notation as it matches the wkhtmltopdf options format.
options = {
'page-size': 'Letter',
'margin-top': '0.9in',
'margin-right': '0.9in',
'margin-bottom': '0.9in',
'margin-left': '0.9in',
'encoding': "UTF-8",
'header-center': 'YOUR HEADER',
'custom-header' : [
('Accept-Encoding', 'gzip')
],
'no-outline':None
}
you can add the header, which you need in the value for header-center
This question and its answers are quite old and not working for me.
wkhtmltopdf version: $ wkhtmltopdf --version
wkhtmltopdf 0.12.6 (with patched qt)
python 3.8
For wkhtmltopdf,
header-html and footer-html can only be URI, e.g. html url or file path, CANNOT be string. so the idea is to keep each html file on cloud or create a temp file in local as footer and header for reference.
Request:
content as url or html
header as html url or html string
footer as html url or html string
Example:
import logging
import os
import tempfile
import pdfkit
from flask import Flask, Response, make_response
from flask_restx import Resource, Api, fields
Request = api.model('Request', {
'url': fields.String(
required=False,
description='url',
example='https://www.w3schools.com/html/html5_svg.asp',
),
'html': fields.String(
required=False,
description='content html string',
example=example_content_html
),
'header_html': fields.String(
required=False,
description='pdf header html string',
example=example_header_html
),
'footer_html': fields.String(
required=False,
description='pdf footer html string',
example=example_footer_html
),
})
#api.route("/convert_html_to_pdf", endpoint = 'html2pdf')
#api.representation('application/octet-stream')
class PdfConverter(Resource):
#api.doc(body=Request)
def post(self):
logging.info(request.json)
url = request.json.get('url')
html = request.json.get('html')
header_html = request.json.get('header_html')
footer_html = request.json.get('footer_html')
header_uri = 'https://xxxxx/header.html' # default header
footer_uri = 'https://xxxxx/footer.html' # default footer
if header_html:
fph = tempfile.NamedTemporaryFile(suffix='.html')
fph.write(header_html.encode('utf-8'))
fph.flush()
header_uri = fph.name
if footer_html:
fpf = tempfile.NamedTemporaryFile(suffix='.html')
fpf.write(footer_html.encode('utf-8'))
fpf.flush()
footer_uri = fpf.name
options = {
'page-size': 'A4',
'margin-top': '32mm',
'header-spacing': 6,
'footer-spacing': 6,
'header-html': header_uri,
'footer-html': footer_uri,
'margin-right': '0',
'margin-bottom': '16mm',
'margin-left': '0',
'encoding': "UTF-8",
'cookie': [
('cookie-empty-value', '""'),
('cookie-name1', 'cookie-value1'),
('cookie-name2', 'cookie-value2'),
],
'no-outline': None
}
logging.info(options)
if url:
# api.payload['requestedBlobUrl'] = url
# return api.payload
pdf = pdfkit.from_url(url, options=options)
else:
pdf = pdfkit.from_string(html, options=options)
if header_html:
fph.close()
if footer_html:
fpf.close() # close will delete the temp file
response = make_response(pdf)
response.headers['Content-Type'] = 'application/pdf'
response.headers['Content-Disposition'] = 'attachment;filename=report.pdf'
return response
Python pdfkit wrapper only supports html files as header and footer. To be able to support string, all you need to do is generate a temporary file and delete it on close. Here is my code sample.
Using tempfile with delete=True and suffix='.html' arguments will generate a deletable file on temp.close()
import tempfile
temp = tempfile.NamedTemporaryFile(delete=True,suffix='.html')
with open(temp.name, 'w') as f:
f.write("""
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
Code of your header goes here.
</body>
</html>
""")
options = {
'page-size': 'A4',
'margin-top': '1in',
'margin-bottom': '0.75in',
'margin-right': '0.75in',
'margin-left': '0.75in',
'encoding': "UTF-8",
'header-html': temp.name,
'footer-center': "Page [page] of [topage]",
'footer-font-size': "9",
'custom-header': [
('Accept-Encoding', 'gzip')
],
'enable-local-file-access': False,
'no-outline': None
}
pdf = pdfkit.from_string(html_string, options=options)
temp.close()

Categories

Resources