Are there any ways to create a rounded corner table with matplotlib?
Expected result.
Current result.
table.py
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'Name':['alice','bob','charlie','dave','eve','frank','Sum'],
'Age':[14,35,64,7,19,25,164]})[['Name','Age']]
fig, ax = plt.subplots(figsize=(2,2))
ax.axis('off')
ax.axis('tight')
ax.table(cellText=df.values,
colLabels=df.columns,
loc='center',
bbox=[0,0,1,1])
plt.savefig('table.png')
Run in docker container
$ export MPLBACKEND="agg"
$ python3 table.py
I tried to use the bbox of matplotlib.table, but it didn't work. I tried a different (and very tricky) way to deal with it, saving it in HTML format and cutting it into an image. (It's a very lame way to do it.)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'Name':['alice','bob','charlie','dave','eve','frank','Sum'],
'Age':[14,35,64,7,19,25,164]})[['Name','Age']]
html_template = '''
<!DOCTYPE html>
<html lang="jp" dir="ltr">
<head>
<meta charset="utf-8">
<title>table</title>
<style type="text/css">
table {{
border: 1px solid #ccc;
border-collapse: separate;
border-radius: 5px;
border-spacing: 0;
}}
table th, table td {{
border-bottom: 1px solid #ccc;
padding: 10px 20px;
}}
table th {{
border-right: 1px solid #ccc;
background: #EEE;
}}
table tr:first-child th {{
border-radius: 5px 0 0 0;
}}
table tr:first-child td {{
border-radius: 0 5px 0 0;
}}
table tr:last-child th {{
border-bottom: none;
border-radius: 0 0 0 5px;
}}
table tr:last-child td {{
border-bottom: none;
border-radius: 0 0 5px 0;
}}
</style>
</head>
<body>
<div class="container">
{table}
</div>
</body>
</html>
'''
table = df.to_html(classes=['table',"table-bordered", "table-hover"])
html = html_template.format(table=table)
with open("table.html", "w") as f:
f.write(html)
Related
I’m trying to create pdf file for payment receipt, but I’m not able to figure out how I should set border for it.
As border I want to use this image:
But while converting it to pdf, next page gets like this:
How can I make it constant border for all pages?
Python + Django code:
from weasyprint import HTML
html_string = render_to_string('receipt.html', DATA)
html = HTML(string=html_string)
result = html.write_pdf()
f = open(str(os.path.join(MEDIA_URL + "invoice_receipt/", 'temp.pdf')), 'wb')
f.write(result)
file_obj = File(open(MEDIA_URL + "invoice_receipt/" + "temp.pdf", 'rb'))
transaction.receipt_file = file_obj
transaction.save()
receipt.html template:
<style>
table tbody tr td{
border-top: unset !important;
}
table tbody tr:nth-child(7) td,
table tbody tr:nth-child(8) td,
table tbody tr:nth-child(9) td,
table tbody tr:nth-child(10) td,
table tbody tr:nth-child(11) td,
table tbody tr:nth-child(12) td
{
padding-top: 0;
padding-bottom: 0;
}
.amount-in-words{
border-bottom:3px solid black;
}
.table thead th {
vertical-align: bottom;
border-bottom: 4px solid black;
}
/* .invoice-template{
padding: 20px;
border: 20px solid transparent;
border-image: linear-gradient(to right,#633363 50%,#f3c53d 50%);
border-image-slice: 1;
} */
.logo{
margin-top: 2rem;
}
.logo2{
margin-top: 2rem;
height: 160px;
width:200px;
}
.invoice-template{
padding: 20px;
background-image: url('https://dev-api.test.com/files/files/DumpData/Frame.png');
background-repeat: no-repeat;
background-size: contain;
break-inside: auto;
}
.main-container{
border: 1px solid black;
padding: 20px 10px;
background: white;
}
p {
font-weight: 500;
}
</style>
</head>
<body>
<div class="container invoice-template">
<!-- <div class="main-container"> -->
<div class="row justify-content-center">
<div class="col-md-5 logo"><img src={{ logo }} class="logo2"></div>
<div class="col-md-5 text-right">
<ul style="list-style: none; color: purple; margin-top: 2rem;">
<li>{{ phone }}<span></span></li>
<li><p>{{ email }}<br>{{ website }}</p><span></span></li>
<li>Resource Factory Pvt. Ltd.<br>{{ shop_address|linebreaksbr }}<span></span></li>
</ul>
</div>
</div>
<div class="row text-center">
<div class="col-md-12"><h6>INVOICE</h6></div>
</div>
<div class="row justify-content-center">
<div class="col-md-5">
<p>
To,<br>
{{ user_name }}<br>
{{ user_address|linebreaksbr }}
</p>
<p>Client GST Number.:</p>
</div>
<div class="col-md-5 text-center">
<p>Date: {{ order_date|date:"d-m-Y" }}</p>
<p>Invoice No. {{ invoice }}</p>
</div>
</div>
I’m giving a short version of my html code. If needed full code please mention.
The behavior of box decorations when a box is split (like your main <div> here) is controller by box-decoration-break. Default is slice which breaks the borders after rendering them. clone will compute the borders on each part of the box:
.invoice-template {
box-decoration-break: clone;
}
I am having issues with comparing two HTML files using Pythob difflib. While I was able to generate a comparison file that highlighted any changes, when I opened the comparison file, it displayed the raw HTML and CSS script/tags instead of the plain text.
E.g
<Html><Body><div class="a"> Text Here</div></Body></html>
instead of
Text Here
My Python Script is as follows:
import difflib
file1 = open('file1.html', 'r').readlines()
file2 = open('file2.html', 'r').readlines()
diffHTML = difflib.HtmlDiff()
htmldiffs = diffHTML.make_file(file1,file2)
with open('Comparison.html', 'w') as outFile:
outFile.write(htmldiffs)
My input files looks something like this
<!DOCTYPE html>
<html>
<head>
<title>Text here</title>
<style type="text/css">
#media all {
h1 {
margin: 0px;
color: #222222;
}
#page-title {
color: #222222;
font-size: 1.4em;
font-weight: bold;
}
body {
font: 0.875em/1.231 tahoma, geneva, verdana, sans-serif;
padding: 30px;
min-width: 800px;
}
.divider {
margin: 0.5em 15% 0.5em 10%;
border-bottom: 1px solid #000;
}
}
.section.header {
background-color: #303030;
color: #ffffff;
font-weight: bold;
margin: 0 0 5px 0;
padding: 5px 0 5px 5px;
}
.section.subheader {
background-color: #CFCFCF;
color: #000;
font-weight: bold;
padding: 1px 5px;
margin: 0px auto 5px 0px;
}
.response_rule_prefix {
font-style: italic;
}
.exception-scope
{
color: #666666;
padding-bottom: 5px;
}
.where-clause-header
{
color:#AAAA99;
}
.section {
padding: 0em 0em 1.2em 0em;
}
#generated-Time {
padding-top:5px;
float:right;
}
#page-title, #generated-Time {
display: inline-block;
}
</style></head>
<body>
<div id="title-section" class="section ">
<div id="page-branding">
<h1>Title</h1>
</div>
<div id="page-title">
Sub title
</div>
<div id="generated-Time">
Date & Time : Jul 2, 2020 2:42:48 PM
</div>
</div>
<div class="section header">General</div>
<div id="general-section" class="section">
<div class="general-detail-label-container">
<label id="policy-name-label">Text here</label>
</div>
<div class="general-detail-content-container">
<span id="policy-name-content" >Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-description-label">Description A :</label>
</div>
<div class="general-detail-content-container">
<span id="policy-description-content""></span>Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-label-label" class="general-detail-label">Description b:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-label-content" class="wrapping-text"></span>
</div>
<div class="general-detail-label-container">
<label id="policy-group-label" class="general-detail-label">Group:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-group-content" class="wrapping-text">Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-status-label" class="general-detail-label">Status:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-status-content">
<label id="policy-status-message">Active</label>
</span>
</div>
<div class="general-detail-label-container">
<label id="policy-version-label" class="general-detail-label">Version:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-version-content" class="wrapping-text">7</span>
</div>
<div class="general-detail-label-container">
<label id="policy-last-modified-label" class="general-detail-label">Last Modified:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-last-modified-content" class="wrapping-text">Jun 15, 2020 2:41:48 PM</span>
</div>
</div>
</body>
</html>
Assuming that you are only looking for the text changes and not changes to the HTML, you could strip the output of HTML after the comparison. There are a number of ways to achieve this. The two that I first thought of was:
RegEx, because this is native in Python, and
BeautifulSoup, because it was created to read webpages
Create a function, using either of above methods, to strip the output of HTML
e.g. using BeautifulSoup
UPDATE
Reading through the documentation again, it seems to me that the comparison will yield the actual HTML too, however, you could create an additional output that only shows the text changes.
Also to avoid showing the whole document, I've set the context parameter to True
Using BeautifulSoup
import re
import difflib
from bs4 import BeautifulSoup
def remove_html_bs(raw_html):
data = BeautifulSoup(raw_html, 'html.parser')
data = data.findAll(text=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('<!--.*-->', str(element.encode('utf-8'))):
return False
return True
result = list(filter(visible, data))
return result
def compare_files(file1, file2):
"Creates two comparison files"
file1 = file1.readlines()
file2 = file2.readlines()
# Difference line by line - HTML
difference_html = difflib.HtmlDiff(tabsize=8).make_file(file1, file2, context=True, numlines=5)
# Difference line by line - raw
difference_file = set(file1).difference(file2)
# List of differences by line index
difference_index = []
for dt in difference_file:
for index, t in enumerate(file1):
if dt in t:
difference_index.append(f'{index}, {remove_html_bs(dt)[0]}, {remove_html_bs(file2[index])[0]}')
# Write entire line with changes
with open('comparison.html', 'w') as outFile:
outFile.write(difference_html)
# Write only text changes, by index
with open('comparison.txt', 'w') as outFile:
outFile.write('LineNo, File1, File2\n')
outFile.write('\n'.join(difference_index))
return difference_html, difference_file
file1 = open('file1.html', 'r')
file2 = open('file2.html', 'r')
difference_html, difference_file = compare_files(file1, file2)
I have created a simple Flask app which renders a template 'index.html' and in that HTML I am attempting to list various plots as a sort of dashboard-style webpage with other content. I know the basics of Flask and Dash though am not using Dash as I want to have more control over the HTML/CSS hence using Flask to create a website to embed the graphs using Plotly.
So far I've had no luck with any of the official documentation or any medium.com or suchlike articles. The closest I have come to is this answer: Embedding dash plotly graphs into html
However, it isn't working when I run my app and the browser launches in localhost. Instead it just gives me a lot of text which is clearly the plotly figure, but it isn't turning into a graph.
Here is all my py/html/css even if the navbar stuff isn't relevant; just in case (I am still learning so I'm sure there will be some better ways to do things..)
Thanks for any help.
DataFrame class which grabs the latest Coronavirus data and returns as pandas.dataframe:
import pandas as pd
import requests
class DataFrame:
"""
Class which grabs the data live from the ECDC and returns it in a pandas dataframe
"""
def __init__(self):
"""
Creating the pandas dataframe of the ECDC JSON data
"""
self.url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/json"
self.file = requests.get(self.url).json()
self.file = self.file['records']
self.df = pd.DataFrame(data=self.file)
def converter(self):
"""
Converting the dtypes from object to int for ints, and date to date
Also renames the columns to more visual-friendly names
:return: None
"""
self.df['cases'] = self.df['cases'].astype(int)
self.df['deaths'] = self.df['deaths'].astype(int)
self.df['popData2018'] = self.df['popData2018'].astype(str).replace('', 0).astype(int)
self.df['dateRep'] = self.df['dateRep'].to_timestamp
cols_rename = 'date day month year cases deaths country geo_id country_id population continent'.split()
cols_rename = [s.capitalize() for s in cols_rename]
self.df.columns = cols_rename
def return_df(self):
"""
:return: pandas DataFrame
"""
self.converter()
return self.df
app.py
from plotly.offline import plot
import plotly.graph_objects as go
from dataframe.dataframe import DataFrame
from flask import Flask, render_template, redirect, request, url_for
app = Flask(__name__)
def graph_maker():
df = DataFrame().return_df()
data = []
for continent in df['Continent'].unique():
df_filt = df[df['Continent'] == continent]
data.append(go.Scatter(x=df_filt["Cases"],
y=df_filt["Deaths"],
mode='markers',
text=df_filt['Country'],
name=continent))
layout = go.Layout(title="Deaths (Y) v Cases (X) by continent")
fig = go.Figure(data=data, layout=layout)
return plot(figure_or_data=fig,
include_plotlyjs=False,
output_type='div')
#app.route('/')
def index():
graph = graph_maker()
return render_template('index.html',
graph=graph)
if __name__ == '__main__':
app.run(debug=True)
index.html
{% extends "navbar.html" %}
<head>
<meta charset="UTF-8">
<link type="text/css" rel="stylesheet" href="..\static\master.css">
<link href="https://fonts.googleapis.com/css2?family=Maven+Pro&display=swap" rel="stylesheet">
<!-- Plotly.js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
{% block nbar %}
<body>
<div class="global-box" id="global-stats">
<h1>Global charts</h1>
<p>Title here</p>
<ul class="global-box-ul">
<li class="global-box-ul-li">
{{ graph }}
</li>
<li class="global-box-ul-li">
Another chart here
</li>
</ul>
</div>
</body>
{% endblock %}
navbar.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>C19DB</title>
<meta charset="UTF-8">
<link type="text/css" rel="stylesheet" href="..\static\master.css">
<link href="https://fonts.googleapis.com/css2?family=Maven+Pro&display=swap" rel="stylesheet">
</head>
<nav class="navbar">
<div class="logo">c19db</div>
<div class="list">
<ul class="navbar_items">
<li class="navbar_item">Dashboard</li>
<li class="navbar_item">About</li>
<li class="navbar_item">Register</li>
</ul>
</div>
</nav>
{% block nbar %}
{% endblock %}
</html>
master.css
html, body {
font-family: 'Maven Pro';
height: 700px;
margin: 0;
}
.navbar {
background: rgb(237, 232, 232);
vertical-align: middle;
}
.logo {
vertical-align: middle;
display: inline-block;
color: rgb(196, 69, 69);
font-size: 50px;
width: 250px;
padding: 5px 15px 5px 15px;
}
.list{
vertical-align: middle;
display: inline-block;
width: calc(100% - 285px);
text-align: right;
}
.navbar_items {
list-style: none;
font-size: 20px;
color: rgb(61, 61, 61)
}
.navbar_item{
display: inline-block;
padding: 5px 15px 5px 15px;
}
a {
text-decoration: none;
}
.navbar_item > a{
display: inline-block;
padding: 5px 15px 5px 15px;
color: rgb(61, 61, 61);
}
.navbar_item > a:hover {
display: inline-block;
padding: 5px 15px 5px 15px;
color: rgb(196, 69, 69);
}
.footer, .footer a {
position: relative;
background: rgb(237, 232, 232, 0.2);
width: 100%;
color: rgb(61, 61, 61, 0.2);
text-align: center;
}
span {
font-weight: bold;
}
.global-box {
text-align: center;
border: 2px black solid;
list-style: none;
margin: auto;
}
.global-box > h1, .global-box > p {
margin: 1px;
}
ul {
display: contents;
}
.global-box-ul-li {
display: inline-block;
border: 2px lightblue solid;
list-style: none;
margin: auto;
width: 48%;
height: 100%;
}
Thank you for any help!
I have solved this problem.
Nutshell:
Create a chart
Call pyo.plot() as normal passing through the fig, output_type='div' and include_plotlyjs=False
Have that output to a variable passed through Markup() (import from flask)
Have the Markup(variable) passed through the render_template like you would a form
Have the variable rendered in the html using {{ jinja template }}
First, create your Plotly chart like normal. I will not give a whole example but just the key points. I create charts in functions for import and use in multiple pages if necessary. In this case, it's necessary because the chart must be assigned to a variable.
def my_bar_chart():
*snip irrelevant*
my_bar_chart = pyo.plot(fig, output_type='div', include_plotlyjs=False)
return Markup(my_bar_chart)
Now import your function to your app.py / wherever your views are and pass it through render template as you would any form, for example.
Here is an example:
def my_page():
my_bar_chart_var = my_bar_chart()
return render_template('my_page.html',
bar_chart_1=my_bar_chart_var)
Then on the html for that page simply pass through bar_chart_1 in a jinja template like so:
{{ bar_chart_1 }}
And done.
I am having a python program that receives a CSV file (using pandas.read_csv option) from the user and returns the size of the CSV file as output.
My requirement is to have a GUI with an upload button and show the size of dataset in that GUI itself. I am a beginner in Python and Flask. Could anyone guide me here how to achieve this requirement in Python Flask?
I tried the following code but it says "Internal Server Error" and I don't know how to implement upload button.
import pandas as pd
from flask import Flask
app = Flask(__name__)
#app.route('/')
def data_shape():
data = pd.read_csv('iris.csv') # there should be an upload option / button for this dataset in the flask webservice
return data.shape
if __name__ == '__main__':
app.run(host='0.0.0.0')
I have to upload the csv file in the webservice and get the output of the python code in the webservice itself.
app.py
import os
import shutil
from flask import Flask, flash, request, redirect, render_template
from werkzeug.utils import secure_filename
from flask import Flask, session
from fastai.vision import *
basedir = os.path.abspath(os.path.dirname(__file__))
UPLOAD_FOLDER = os.path.join('static', 'csv')
app = Flask(__name__)
app.secret_key = "secret key"
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024
ALLOWED_EXTENSIONS = set(['csv','xls'])
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
#app.route('/')
def upload_form():
return render_template('index.html')
#app.after_request
def add_header(response):
"""
Add headers to both force latest IE rendering engine or Chrome Frame,
and also to cache the rendered page for 10 minutes.
"""
response.headers['X-UA-Compatible'] = 'IE=Edge,chrome=1'
response.headers['Cache-Control'] = 'public, max-age=0'
return response
#app.route('/', methods=['POST'])
def upload_file():
# shutil.rmtree(UPLOAD_FOLDER)
# os.mkdir(UPLOAD_FOLDER)
disp_div = 'none'
disp_div_tumor = 'none'
d = request.form.to_dict()
# print("dddd;",d)
button_name = 'None'
if (len(d)!=0):
button_name = list(d.items())[-1][0]
file = request.files['file']
print("file:",file)
if file.filename == '':
flash('No file selected for uploading','red')
# return redirect(request.url)
return render_template('index.html', disp_div = disp_div)
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
shutil.rmtree(UPLOAD_FOLDER)
os.mkdir(UPLOAD_FOLDER)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
flash('File successfully uploaded!', 'green')
print(UPLOAD_FOLDER)
print("==>",os.path.join(UPLOAD_FOLDER, sorted(os.listdir(app.config['UPLOAD_FOLDER']))[0]))
csv_file = pd.read_csv(os.path.join(UPLOAD_FOLDER, sorted(os.listdir(app.config['UPLOAD_FOLDER']))[0]))
csv_shape = csv_file.shape
return render_template('index.html', csv_shape=csv_shape)
# return redirect('/')
else:
flash('Allowed file types are txt, pdf, png, jpg, jpeg, gif', 'red')
# return redirect(request.url)
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=False, port=5006)
## For deploying the app use `app.run(debug=False, host="0.0.0.0", port=80)`
templates/index.html
<link rel="stylesheet" type="text/css" href="/static/css/main.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="static/js/index.js"></script>
<script>
$(document).ready(function(){
$("#target").on('submit',function(){
// alert("It works");
});
});
</script>
<!doctype html>
<html>
<head>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<body>
<form id="target" method="post" enctype="multipart/form-data">
<div name ="up" class="upload-btn-wrapper">
<button name="upload" class="btn">Upload CSV File</button>
<input type="file" id="file" value="go" name="file" onchange="$('#target').submit();"/>
</div>
</form>
<p>
{% with messages = get_flashed_messages(with_categories=true) %}
{% if messages %}
<div class=flashes>
{% for category_color, message in messages %}
<p class="error_text" style="color:{{ category_color }};width:500px;">{{ message }}</p>
{% endfor %}
</div>
{% endif %}
{% endwith %}
</p>
===> {{ csv_shape }}
</body>
</html>
static/css/main.css:
.upload-btn-wrapper {
position: absolute;
overflow: hidden;
display: inline-block;
top:0;
left:5%;
}
.btn {
width: 15vw;
height: 4vh;
padding: 0 0 2px;
font-size: 2.2vh;
/* font: 90% "Trebuchet MS", Tahoma, Arial, sans-serif; */
font-family: sans-serif;
font-weight: bold;
line-height: 32px;
text-transform: uppercase;
margin: 0.2em auto;
display: block;
outline: none;
position: relative;
cursor: pointer;
border-radius: 3px;
color: #ffffff;
text-shadow: 1px 1px #024bde;
border: 1px solid #507def;
border-top: 1px solid #2f73ff;
border-bottom: 1px solid #2a67ff;
box-shadow: inset 0 1px #4a82ff, inset 1px 0 #2653b9, inset -1px 0 #2d69e8, inset 0 -1px #4372e8, 0 2px #1c3d9e, 0 6px #2553a2, 0 4px 2px rgba(0,0,0,0.4);
background: -moz-linear-gradient(top, #cae285 0%, #a3cd5a 100%);
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#cae285), color-stop(100%,#a3cd5a));
background: -webkit-linear-gradient(top, #6292ff 0%,#2b6cff 100%);
background: -o-linear-gradient(top, #cae285 0%,#a3cd5a 100%);
background: -ms-linear-gradient(top, #cae285 0%,#a3cd5a 100%);
background: linear-gradient(top, #cae285 0%,#a3cd5a 100%);
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#cae285', endColorstr='#a3cd5a',GradientType=0 );
background-color: #1c3c9e;
}
.btn::-moz-focus-inner{border:0}
.btn:active {
/*top: 3px;*/
/* border: 1px solid #88A84E;
border-top: 1px solid #6E883F;
border-bottom: 1px solid #95B855;
background: #A7CF5F;*/
box-shadow: inset 0 1px 2px #779441;
transform: translateY(3px);
}
.pred {
position: absolute;
left: 10%;
bottom: 0;
}
.upload-btn-wrapper input[type=file] {
font-size: 100px;
position: absolute;
left: 0;
top: 0;
opacity: 0;
}
.submit_btn {
border: 2px solid gray;
color: gray;
background-color: white;
padding: 8px 20px;
border-radius: 8px;
font-size: 20px;
font-weight: bold;
}
.error_text {
position: absolute;
top: 2.7vh;
font-weight: bold;
font-size: 2vh;
left: 5%;
}
Directory structure:
|-- fapp.py
|-- static
| |-- css
| | `-- main.css
| `-- csv
`-- templates
`-- index.html
Edit:
I am building HTML table from the list through lxml.builder and striving to make a link in one of the table cells
List is generated in a following way:
with open('some_file.html', 'r') as f:
table = etree.parse(f)
p_list = list()
rows = table.iter('div')
p_list.append([c.text for c in rows])
rows = table.xpath("body/table")[0].findall("tr")
for row in rows[2:]:
p_list.append([c.text for c in row.getchildren()])
HTML file which I parse is the same that is generated further by lxml, i.e. I set up some sort of recursion for testing purposes.
And here is how I build table
from lxml.builder import E
page = (
E.html(
E.head(
E.title("title")
),
E.body(
....
*[E.tr(
*[
E.td(E.a(E.img(src=str(col)))) if ind == 8 else
E.td(E.a(str(col), href=str(col))) if ind == 9 else
E.td(str(col)) for ind, col in enumerate(row)
]
) for row in p_list ]
When I specify link via literals all is going fine.
E.td(E.a("link", href="url_address"))
However, when I try to output list element value (which is https://blahblahblah.com) as a link
E.td(E.a(str(col), href=str(col)))
cell is empty, just nothing is showed in the cell.
If I specify link text as a literal and put str (col) into href, the link is showed normally, but instead of real href it contains the name of the generated html file.
If I output just that col value as a string
E.td(str(col))
it is showed normally, i.e. it is not empty. What is wrong with E.a and E.img elements?
Just noticed that this happens only if I build list from html file. When I build list manually, like this, all is output fine.
p_list = []
p_element = ['id']
p_element.append('value')
p_element.append('value2')
p_list.append(p_element)
Current output (pay attention to <a> and <href> tags)
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1020</td>
<td>ТаисияСтрахолет</td>
<td>No</td>
<td>Female</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>None</td>
<td>
<a>
<img src="
"/>
</a>
</td>
<td>
<a href="
">
</a>
</td>
...
</tr>
</table>
</body>
</html>
Desired output
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1019</td>
<td>МихаилПавлов</td>
<td>No</td>
<td>Male</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>C.-Петербург</td>
<td>
<a>
<img src="http://i.imgur.com/rejChZW.jpg"/>
</a>
</td>
<td>
link
</td>
...
</tr>
</table>
</body>
</html>
Got it myself. The problem was not in generating but in parsing HTML. Parsing function didn't fetch IMG and A tags nested in TD and these elements of the list were empty. Due to the rigorous logic of the program (fetching from file + fetching from site API) I wasn't able to detect the cause of the issue.
The correct parsing logic should be:
for row in rows[1:]:
data.append([
c.find("a").text if c.find("a") is not None else
c.find("img").attrib['src'] if c.find("img") is not None else
c.text
for c in row.getchildren()
])
Just a guess but it looks like 'str()' could be escaping it. Try E.td(E.a(col, href=col))