Handling unicode in Python 2.7 - python

I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2661' in position 1409: ordinal not in range(128)
I'm very green to programming still, so have mercy on me and my ignorance. But I understand the error to be that it's not able to handle unicode characters. There's that at least one unicode char, but there could be countless others that'll perk up in that feed.
I've done some looking for others who've had similar problems, but I can't can't find a solution I understand or can make work.
#import library to do http requests:
import urllib
from xml.dom.minidom import parseString, parse
f = open('games.html', 'w')
document = urllib.urlopen('https://itunes.apple.com/us/rss/topfreemacapps/limit=300/genre=12006/xml')
dom = parse(document)
image = dom.getElementsByTagName('im:image')
title = dom.getElementsByTagName('title')
price = dom.getElementsByTagName('im:price')
address = dom.getElementsByTagName('id')
imglist = []
titlist = []
pricelist = []
addlist = []
i = 0
j = 20
k = 40
f.write('''\
<!DOCTYPE html>
<html>
<head>
<style type="text/css">
<!--
A:link {text-decoration: none; color: #246DA8;}
A:visited {text-decoration: none; color: #246DA8;}
A:active {text-decoration: none; color: #40A9E3;}
A:hover {text-decoration: none; color: #40A9E3;}
.box {
vertical-align:middle;
width: 180px;
height: 120px;
border: 1px solid #99c;
padding: 5px;
margin: 0px;
margin-left: auto;
margin-right: auto;
-moz-border-radius: 5px;
border-radius: 5px;
-webkit-border-radius: 5px;
background-color:#ffffff;
font-family: Arial, Helvetica, sans-serif; color: black;
font-size: small;
font-weight: bold;
}
-->
</style>
</head>
<body>
''')
for i in range(0,len(image)):
if image[i].getAttribute('height') == '53':
imglist.append(image[i].firstChild.nodeValue)
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue)
for i in range(0,len(price)):
pricelist.append(price[i].firstChild.nodeValue)
for i in range(1,len(address)):
addlist.append(address[i].firstChild.nodeValue)
for i in range(0,20):
f.write('''
<div style="width: 600px;">
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i] + '''<br>
<img src="''' + imglist[i] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i] + '''</span>
</div>
</div>
</div>
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i+j] + '''<br>
<img src="''' + imglist[i+j] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i+j] + '''</span>
</div>
</div>
</div>
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i+k] + '''<br>
<img src="''' + imglist[i+k] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i+k] + '''</span>
</div>
</div>
</div>
<br style="clear: left;" />
</div>
<br>
''')
f.write('''</body>''')
f.close()

The basic problem is that you're concatenating the Unicode strings with ordinary byte-strings without converting them using a proper encoding; in these cases, ASCII is used by default (which, clearly, can't handle extended characters).
The line in your script that does this is too long to quote, but another practical example which displays the same problem could look like this:
parameter = u"foo \u2661"
sys.stdout.write(parameter + " bar\n")
You will need to instead encode the Unicode strings with an explicitly specified encoding, e.g. like this:
parameter = u"foo \u2661"
sys.stdout.write(parameter.encode("utf8") + " bar\n")
In your case, you can do this in your loops so as to not have to specify it on every concatenation:
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue.encode("utf8"))
--
Also, while we're at it, you can improve your code by not iterating through the elements using an integer index. For instance, instead of this:
title = dom.getElementsByTagName('title')
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue.encode("utf8"))
... you can do this instead:
for title in dom.getElementsByTagName('title')
titlist.append(title.firstChild.nodeValue.encode("utf8"))

Related

Weasy-print convert to pdf with border image

I’m trying to create pdf file for payment receipt, but I’m not able to figure out how I should set border for it.
As border I want to use this image:
But while converting it to pdf, next page gets like this:
How can I make it constant border for all pages?
Python + Django code:
from weasyprint import HTML
html_string = render_to_string('receipt.html', DATA)
html = HTML(string=html_string)
result = html.write_pdf()
f = open(str(os.path.join(MEDIA_URL + "invoice_receipt/", 'temp.pdf')), 'wb')
f.write(result)
file_obj = File(open(MEDIA_URL + "invoice_receipt/" + "temp.pdf", 'rb'))
transaction.receipt_file = file_obj
transaction.save()
receipt.html template:
<style>
table tbody tr td{
border-top: unset !important;
}
table tbody tr:nth-child(7) td,
table tbody tr:nth-child(8) td,
table tbody tr:nth-child(9) td,
table tbody tr:nth-child(10) td,
table tbody tr:nth-child(11) td,
table tbody tr:nth-child(12) td
{
padding-top: 0;
padding-bottom: 0;
}
.amount-in-words{
border-bottom:3px solid black;
}
.table thead th {
vertical-align: bottom;
border-bottom: 4px solid black;
}
/* .invoice-template{
padding: 20px;
border: 20px solid transparent;
border-image: linear-gradient(to right,#633363 50%,#f3c53d 50%);
border-image-slice: 1;
} */
.logo{
margin-top: 2rem;
}
.logo2{
margin-top: 2rem;
height: 160px;
width:200px;
}
.invoice-template{
padding: 20px;
background-image: url('https://dev-api.test.com/files/files/DumpData/Frame.png');
background-repeat: no-repeat;
background-size: contain;
break-inside: auto;
}
.main-container{
border: 1px solid black;
padding: 20px 10px;
background: white;
}
p {
font-weight: 500;
}
</style>
</head>
<body>
<div class="container invoice-template">
<!-- <div class="main-container"> -->
<div class="row justify-content-center">
<div class="col-md-5 logo"><img src={{ logo }} class="logo2"></div>
<div class="col-md-5 text-right">
<ul style="list-style: none; color: purple; margin-top: 2rem;">
<li>{{ phone }}<span></span></li>
<li><p>{{ email }}<br>{{ website }}</p><span></span></li>
<li>Resource Factory Pvt. Ltd.<br>{{ shop_address|linebreaksbr }}<span></span></li>
</ul>
</div>
</div>
<div class="row text-center">
<div class="col-md-12"><h6>INVOICE</h6></div>
</div>
<div class="row justify-content-center">
<div class="col-md-5">
<p>
To,<br>
{{ user_name }}<br>
{{ user_address|linebreaksbr }}
</p>
<p>Client GST Number.:</p>
</div>
<div class="col-md-5 text-center">
<p>Date: {{ order_date|date:"d-m-Y" }}</p>
<p>Invoice No. {{ invoice }}</p>
</div>
</div>
I’m giving a short version of my html code. If needed full code please mention.
The behavior of box decorations when a box is split (like your main <div> here) is controller by box-decoration-break. Default is slice which breaks the borders after rendering them. clone will compute the borders on each part of the box:
.invoice-template {
box-decoration-break: clone;
}

how to convert this HTML file into a PDF using python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I am having a process where my python code needs to generate a PDF.
I have an HTML file as follows:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="./proforma_supply_bill.css">
<link rel="preconnect" href="https://fonts.gstatic.com">
<link href="https://fonts.googleapis.com/css2?family=Karla:wght#400;600;700&family=Rajdhani:wght#700&display=swap"
rel="stylesheet">
<title>Proforma bill of supply</title>
<style>
body {
font-family: "Karla", sans-serif;
margin: 0;
padding: 0;
box-sizing: border-box;
overflow-x: hidden;
}
th,
td {
padding: 0 !important;
text-align: left;
position: relative;
}
.mb-3 {
margin-bottom: 3px;
}
.mb-5 {
margin-bottom: 5px;
}
.pr-10{
padding-right: 10px !important;
}
.text-right {
text-align: right;
}
.border-b_color {
border-bottom: 1px solid #93d150;
}
.border-t {
border-top: 1px solid #dedede;
}
.border-b {
border-bottom: 1px solid #dedede;
}
.card {
background-color: #fff;
padding: 0 20px;
}
.card__header,
.card__total_amount,
.card__amount_section {
display: flex;
align-items: center;
justify-content: space-between;
padding: 17px 0;
}
.card__header_img {
width: 175px;
}
.card__header_title {
font-family: "Rajdhani", sans-serif;
font-size: 24px;
font-weight: bold;
color: #2b9eaa;
text-transform: uppercase;
}
.card__info_flex {
display: flex;
padding: 10px 0;
}
.card__info {
flex: 50%;
}
.card__info_row:not(:last-child) {
margin-bottom: 5px;
}
.card__info_title,
.card__table_data_row__content {
color: #141414;
font-size: 15px;
}
.card__info_text {
color: #141414;
font-size: 15px;
font-weight: bold;
}
.card__table {
width: 100%;
border-collapse: collapse;
}
.card__table_header_row {
background-color: #2b9eaa;
}
.card__table_data_row {
border-top: 1px solid #dedede;
}
.card__table_header_row__content {
color: #fff;
font-size: 15px;
font-weight: bold;
padding: 6px 0;
}
.card__table_data_row__content {
padding: 5px 0;
}
.card__table_data_row__subcontent {
font-size: 13px;
color: #141414;
width: max-content;
margin-left: 10px;
}
.dashed_b-t {
border-top: 1px dashed #dedede;
}
.card__total_amount {
background-color: #F2F2F2;
padding: 4px 10px 5px;
border-top: 1px solid #dedede;
border-bottom: 1px solid #dedede;
}
.card__amount_section {
padding: 6px 10px 5px;
align-items: flex-start;
}
.card__total_amount__title,
.card__total_amount__title_lg {
font-size: 18px;
color: #141414;
font-weight: bold;
}
.card__declaration {
padding: 6px 10px 5px;
background-color: #D4ECEE;
font-size: 15px;
font-weight: bold;
color: #2b2b2b;
}
.card__signature {
margin-top: 30px;
padding: 0 10px;
}
</style>
</head>
<body>
<div class="card">
<div class="card__header border-b_color">
<img src="https://res.cloudinary.com/exportify/image/upload/v1573547246/ExportifyLogo/exportify_logo_166x31_OG_yhqmrg.svg"
alt="Exportify Logo" class="card__header_img">
<div class="card__header_title">PROFORMA Bill of Supply</div>
</div>
<div class="card__info_flex border-b_color">
<div class="card__info">
<div class="card__info_row">
<span class="card__info_title">Proforma Invoice No.: </span>
<span class="card__info_text">{{PROFORMA_INV_NO}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">Reference No. & Date.: </span>
<span class="card__info_text">{{REF_NO}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">Buyer's Order No.: </span>
<span class="card__info_text">{{BUYER_ORDER_NO}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">Vessel/Flight No.: </span>
<span class="card__info_text">{{VESSEL_NAME}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">City/Port of Loading: </span>
<span class="card__info_text">{{POL}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">Terms of Delivery: </span>
<span class="card__info_text">{{DELIVERY_TERMS}}</span>
</div>
</div>
<div class="card__info">
<div class="card__info_row">
<span class="card__info_title">Dated: </span>
<span class="card__info_text">{{INV_DATE}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">SAIL Date: </span>
<span class="card__info_text">{{SAIL_DATE}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">Place of Receipt by Shipper: </span>
<span class="card__info_text">{{PLACE_OF_RECEIPT}}</span>
</div>
<div class="card__info_row">
<span class="card__info_title">City/Port of Discharge: </span>
<span class="card__info_text">{{POD}}</span>
</div>
</div>
</div>
<div class="card__info_flex" style="padding-bottom: 0;">
<div class="card__info"></div>
<div class="card__info mb-3">
<span class="card__info_title">Buyer (Bill to)</span>
</div>
</div>
<div class="card__info_flex" style="padding-top: 0;">
<div class="card__info">
<div class="card__info_text mb-3">XPORTIFY TECHNOLOGIES PRIVATE LIMITED</div>
<div class="card__info_title mb-5" style="line-height: 20px; width: 85%;">
3rd Floor, 313-314, A/3, BGTA Ganga Premises, Wadala Truck Terminal, Near Wadala RTO, Wadala East,
Mumbai
</div>
<div class="card__info_row">
<span class="card__info_title">GSTIN/UIN: </span>
<span class="card__info_text">27AAACX2283M1ZX</span>
</div>
<div class="card__info_row">
<span class="card__info_title">PAN No: </span>
<span class="card__info_text">AAACX2283M</span>
</div>
<div class="card__info_title mb-5">State Name: Maharashtra, Code: 27</div>
<div class="card__info_row">
<span class="card__info_title">CIN: </span>
<span class="card__info_text">U74999MH2017PTC295494</span>
</div>
</div>
<div class="card__info">
<div class="card__info_text mb-3">{{BUYER_COMPANY_NAME}}</div>
<div class="card__info_title mb-5" style="line-height: 20px;">
{{BUYER_ADDRESS}}
</div>
<div class="card__info_row">
<span class="card__info_title">GSTIN/UIN: </span>
<span class="card__info_text">{{BUYER_GST}}</span>
</div>
<div class="card__info_title mb-5">State Name: {{BUYER_STATE}}</div>
<div class="card__info_row">
<span class="card__info_title">Place of Supply: </span>
<span class="card__info_text">{{BUYER_PLACE_OF_SUPPLY}}</span>
</div>
</div>
</div>
<table class="card__table mb-5">
<thead>
<tr class="card__table_header_row">
<th>
<div class="card__table_header_row__content" style="padding-left: 10px;">Sr. No.</div>
</th>
<th>
<div class="card__table_header_row__content">Description of Services</div>
</th>
<th>
<div class="card__table_header_row__content">HSN/SAC</div>
</th>
<th>
<div class="card__table_header_row__content">Quantity</div>
</th>
<th>
<div class="card__table_header_row__content">Rate</div>
</th>
<th>
<div class="card__table_header_row__content">Per</div>
</th>
<th>
<div class="card__table_header_row__content text-right" style="padding-right: 10px;">
Amount
</div>
</th>
</tr>
</thead>
<tbody>
<tr class="card__table_data_row">
<td>
<div class="card__table_data_row__content" style="padding-left: 10px;">1</div>
</td>
<td>
<div class="card__table_data_row__content">Freight Charges</div>
<div class="card__table_data_row__subcontent">$ 1938/20x1x#74.97</div>
</td>
<td>
<div class="card__table_data_row__content">996521</div>
</td>
<td>
<div class="card__table_data_row__content">1</div>
</td>
<td>
<div class="card__table_data_row__content">7,900.00</div>
</td>
<td>
<div class="card__table_data_row__content">Container</div>
</td>
<td>
<div class="card__table_data_row__content text-right pr-10" style="right: 0;">USD
7,900.00
</div>
</td>
</tr>
</tbody>
</table>
<div class="card__total_amount mb-5">
<div class="card__total_amount__title">Total</div>
<div class="card__total_amount__title_lg">USD 7,900.00</div>
</div>
<div class="card__amount_section border-b mb-3">
<div class="card__amount_section_flex">
<div class="card__info_title mb-3">Amount Chargeable (in words)</div>
<div class="card__info_text">USD Seven Thousand Nine Hundred Only</div>
</div>
<div class="card__amount_section_flex">
<div class="card__info_title">E & O.E</div>
</div>
</div>
<div class="card__amount_section border-t">
<div class="card__amount_section_flex">
<div class="card__info_title">HSN/SAC</div>
</div>
<div class="card__amount_section_flex">
<div class="card__info_title">Taxable Value</div>
</div>
</div>
<div class="card__amount_section border-t mb-3">
<div class="card__amount_section_flex">
<div class="card__info_title">996521</div>
</div>
<div class="card__amount_section_flex">
<div class="card__info_title">INR 1,45,291.86</div>
</div>
</div>
<div class="card__total_amount mb-5">
<div class="card__total_amount__title" style="font-size: 15px;">Total</div>
<div class="card__total_amount__title_lg" style="font-size: 15px;">INR 1,45,291.86</div>
</div>
<div class="card__info_row" style="padding: 10px;">
<span class="card__info_title">Tax Amount (in words): </span>
<span class="card__info_text">NIL</span>
</div>
<div class="card__declaration">
Declaration
</div>
<div class="card__info_title" style="padding: 10px;">
We declare that this invoice shows the actual price of the goods described and that all particulars are true
and correct.
</div>
<div class="card__info_text" style="padding: 0 10px;">
for XPORTIFY TECHNOLOGIES PRIVATE LIMITED
</div>
<div class="card__signature">
<div class="card__info_title">Authorised Signatory</div>
</div>
</div>
</body>
</html>
I want to convert this HTML file into a PDF using Python.
I have one option of using wkhtmltopdf package but I have to run it using the command line everytime.
Which is the most optimal way of doing this without hampering the flow of my code?
Install pdfkit package
pip install pdfkit
Install wkhtmltopdf https://wkhtmltopdf.org/downloads.html
PDF to HTML in the current folder
import pdfkit
import glob
3.1 Set wkhtmltopdf executable file path
config = pdfkit.configuration(wkhtmltopdf='C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe')
3.2 Convert all html files in the current folder
for file in glob.glob('./*.html'):
pdfkit.from_file(file, file[:-4]+'.pdf', configuration=config)

Comparing two HTML files using Python difflib package

I am having issues with comparing two HTML files using Pythob difflib. While I was able to generate a comparison file that highlighted any changes, when I opened the comparison file, it displayed the raw HTML and CSS script/tags instead of the plain text.
E.g
<Html><Body><div class="a"> Text Here</div></Body></html>
instead of
Text Here
My Python Script is as follows:
import difflib
file1 = open('file1.html', 'r').readlines()
file2 = open('file2.html', 'r').readlines()
diffHTML = difflib.HtmlDiff()
htmldiffs = diffHTML.make_file(file1,file2)
with open('Comparison.html', 'w') as outFile:
outFile.write(htmldiffs)
My input files looks something like this
<!DOCTYPE html>
<html>
<head>
<title>Text here</title>
<style type="text/css">
#media all {
h1 {
margin: 0px;
color: #222222;
}
#page-title {
color: #222222;
font-size: 1.4em;
font-weight: bold;
}
body {
font: 0.875em/1.231 tahoma, geneva, verdana, sans-serif;
padding: 30px;
min-width: 800px;
}
.divider {
margin: 0.5em 15% 0.5em 10%;
border-bottom: 1px solid #000;
}
}
.section.header {
background-color: #303030;
color: #ffffff;
font-weight: bold;
margin: 0 0 5px 0;
padding: 5px 0 5px 5px;
}
.section.subheader {
background-color: #CFCFCF;
color: #000;
font-weight: bold;
padding: 1px 5px;
margin: 0px auto 5px 0px;
}
.response_rule_prefix {
font-style: italic;
}
.exception-scope
{
color: #666666;
padding-bottom: 5px;
}
.where-clause-header
{
color:#AAAA99;
}
.section {
padding: 0em 0em 1.2em 0em;
}
#generated-Time {
padding-top:5px;
float:right;
}
#page-title, #generated-Time {
display: inline-block;
}
</style></head>
<body>
<div id="title-section" class="section ">
<div id="page-branding">
<h1>Title</h1>
</div>
<div id="page-title">
Sub title
</div>
<div id="generated-Time">
Date & Time : Jul 2, 2020 2:42:48 PM
</div>
</div>
<div class="section header">General</div>
<div id="general-section" class="section">
<div class="general-detail-label-container">
<label id="policy-name-label">Text here</label>
</div>
<div class="general-detail-content-container">
<span id="policy-name-content" >Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-description-label">Description A :</label>
</div>
<div class="general-detail-content-container">
<span id="policy-description-content""></span>Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-label-label" class="general-detail-label">Description b:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-label-content" class="wrapping-text"></span>
</div>
<div class="general-detail-label-container">
<label id="policy-group-label" class="general-detail-label">Group:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-group-content" class="wrapping-text">Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-status-label" class="general-detail-label">Status:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-status-content">
<label id="policy-status-message">Active</label>
</span>
</div>
<div class="general-detail-label-container">
<label id="policy-version-label" class="general-detail-label">Version:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-version-content" class="wrapping-text">7</span>
</div>
<div class="general-detail-label-container">
<label id="policy-last-modified-label" class="general-detail-label">Last Modified:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-last-modified-content" class="wrapping-text">Jun 15, 2020 2:41:48 PM</span>
</div>
</div>
</body>
</html>
Assuming that you are only looking for the text changes and not changes to the HTML, you could strip the output of HTML after the comparison. There are a number of ways to achieve this. The two that I first thought of was:
RegEx, because this is native in Python, and
BeautifulSoup, because it was created to read webpages
Create a function, using either of above methods, to strip the output of HTML
e.g. using BeautifulSoup
UPDATE
Reading through the documentation again, it seems to me that the comparison will yield the actual HTML too, however, you could create an additional output that only shows the text changes.
Also to avoid showing the whole document, I've set the context parameter to True
Using BeautifulSoup
import re
import difflib
from bs4 import BeautifulSoup
def remove_html_bs(raw_html):
data = BeautifulSoup(raw_html, 'html.parser')
data = data.findAll(text=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('<!--.*-->', str(element.encode('utf-8'))):
return False
return True
result = list(filter(visible, data))
return result
def compare_files(file1, file2):
"Creates two comparison files"
file1 = file1.readlines()
file2 = file2.readlines()
# Difference line by line - HTML
difference_html = difflib.HtmlDiff(tabsize=8).make_file(file1, file2, context=True, numlines=5)
# Difference line by line - raw
difference_file = set(file1).difference(file2)
# List of differences by line index
difference_index = []
for dt in difference_file:
for index, t in enumerate(file1):
if dt in t:
difference_index.append(f'{index}, {remove_html_bs(dt)[0]}, {remove_html_bs(file2[index])[0]}')
# Write entire line with changes
with open('comparison.html', 'w') as outFile:
outFile.write(difference_html)
# Write only text changes, by index
with open('comparison.txt', 'w') as outFile:
outFile.write('LineNo, File1, File2\n')
outFile.write('\n'.join(difference_index))
return difference_html, difference_file
file1 = open('file1.html', 'r')
file2 = open('file2.html', 'r')
difference_html, difference_file = compare_files(file1, file2)

String variable as href in lxml.builder

I am building HTML table from the list through lxml.builder and striving to make a link in one of the table cells
List is generated in a following way:
with open('some_file.html', 'r') as f:
table = etree.parse(f)
p_list = list()
rows = table.iter('div')
p_list.append([c.text for c in rows])
rows = table.xpath("body/table")[0].findall("tr")
for row in rows[2:]:
p_list.append([c.text for c in row.getchildren()])
HTML file which I parse is the same that is generated further by lxml, i.e. I set up some sort of recursion for testing purposes.
And here is how I build table
from lxml.builder import E
page = (
E.html(
E.head(
E.title("title")
),
E.body(
....
*[E.tr(
*[
E.td(E.a(E.img(src=str(col)))) if ind == 8 else
E.td(E.a(str(col), href=str(col))) if ind == 9 else
E.td(str(col)) for ind, col in enumerate(row)
]
) for row in p_list ]
When I specify link via literals all is going fine.
E.td(E.a("link", href="url_address"))
However, when I try to output list element value (which is https://blahblahblah.com) as a link
E.td(E.a(str(col), href=str(col)))
cell is empty, just nothing is showed in the cell.
If I specify link text as a literal and put str (col) into href, the link is showed normally, but instead of real href it contains the name of the generated html file.
If I output just that col value as a string
E.td(str(col))
it is showed normally, i.e. it is not empty. What is wrong with E.a and E.img elements?
Just noticed that this happens only if I build list from html file. When I build list manually, like this, all is output fine.
p_list = []
p_element = ['id']
p_element.append('value')
p_element.append('value2')
p_list.append(p_element)
Current output (pay attention to <a> and <href> tags)
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1020</td>
<td>ТаисияСтрахолет</td>
<td>No</td>
<td>Female</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>None</td>
<td>
<a>
<img src="
"/>
</a>
</td>
<td>
<a href="
">
</a>
</td>
...
</tr>
</table>
</body>
</html>
Desired output
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1019</td>
<td>МихаилПавлов</td>
<td>No</td>
<td>Male</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>C.-Петербург</td>
<td>
<a>
<img src="http://i.imgur.com/rejChZW.jpg"/>
</a>
</td>
<td>
link
</td>
...
</tr>
</table>
</body>
</html>
Got it myself. The problem was not in generating but in parsing HTML. Parsing function didn't fetch IMG and A tags nested in TD and these elements of the list were empty. Due to the rigorous logic of the program (fetching from file + fetching from site API) I wasn't able to detect the cause of the issue.
The correct parsing logic should be:
for row in rows[1:]:
data.append([
c.find("a").text if c.find("a") is not None else
c.find("img").attrib['src'] if c.find("img") is not None else
c.text
for c in row.getchildren()
])
Just a guess but it looks like 'str()' could be escaping it. Try E.td(E.a(col, href=col))

Moving data from html form to a chart in a MySQL databse using python

I went through a basic web.py tutorial on how to set up a basic server using MySQL and I have a database named 'testdb' with a chart 'todo' inside of it. The values I'm trying to move into the chart are 'email', 'username', and 'password' from the form in index.html. I get the error message 405 method not allowed. And after breaking past 100+ error messages I'm stumped. I already have columns for the three but i have no value assigned to them.
I'm running on a Linux Ubuntu Desktop, in the gedit test editor.
MySQL chart as seen in Terminal
| Field | Type | Null | | Default |
_________________________________________________________
| email | varchar(50) | YES | | NULL |
| username | varchar(50) | YES | | NULL |
| password | varchar(50) | YES | | NULL |
The code for index.html
$def with (todos)
$for todo in todos:
<title>Red.Koala</title>
<body id="background">
<div id="header">
-> Red.Koala <-
</div>
<div id="sign">
<form action="LANForum" method="post">
<table id="tr">
<tr>
Sign In:<br>
</tr>
</table>
Email:<br>
<input id="signbox" type="text" name="email" placeholder="Ex: JoeSmith#example.com" required><br>
Username:<br>
<input id="signbox" type="text" name="username" placeholder="Username..."required><br>
Password:<br>
<input id="signbox" height="30" type="text" name="password" placeholder="Password..." required><br>
<input value="Submit" type="submit" id="signbox" style="margin-top: 20px; right: 5px; width: 50%;">
</div>
</form>
</div>
<div id="copyright">
#HaydnFleming
</div>
<style>
#background {
background: #ff3333;
margin-left: auto;
margin-right: auto;
width: 960px;
}
#header {
border: solid #cc0000;
border-width: 15px;
font-size: 65px;
text-align: center;
}
#sign{
border: solid #cc0000;
border-width: 15px;
font-size: 30px;
text-align: center;
width: 30%;
height: 40%;
margin:auto;
margin-top: 40px ;
margin-bottom: 20%;
}
#signbox {
width: 60%;
height: 10%;
margin-top: 10px;
}
#copyright{
bottom: 2%;
text-align: center;
position: absolute;
width:100%;
}
#tr {
border: solid #cc0000;
position: relative;
width: 100%;
border-bottom: transparent;
border-left: transparent;
border-right: transparent;
}
.dscrptn{
text-align: center;
width: 100%;
position: relative;
z-index: 5;
}
.dscrptn_p{
width: 60%;
text-align: center;
margin: auto;
margin-top: 4%;
position: relative;
border: solid #cc0000;
border-width: 0px;
}
</style>
</body>
The code for Code.py (using Web.py)
import web
render = web.template.render('templates/')
urls = (
'/', 'index', '/add', 'add', '/LANForum', 'LANForum'
)
db = web.database(dbn = 'mysql', user = 'testuser', pw = 'MServer', db = 'testdb')
class index:
def GET(self):
todos = db.select('todo')
return render.index(todos)
class add:
def POST(self):
i = web.input()
n = db.insert('todo', email=i.email, username=i.username, password=i.password)
raise web.seeother('LANForum')
if __name__ == "__main__":
app = web.application(urls, globals())
app.run()
The code for LANForum.html
<link type="text/css" rel="stylesheet" href="LANForum.css">
<title>Red.Koala</title>
<body id="background"> <!-- Only used to set up the main background colors/any images.-->
<div id="title">
<!-- I decided to use the title as a nest
for all the other divs to rest in that way
they could all run off of the same measurements,
and they could line up easily using similar
percentages and pixel distances-->
<down>Red.Koala</down>
<!--"down" is just a random word I used as a tag so that I could
separate the text form the divs. also the "+ [Current Forum Title]"
part means that when the website is actually running you will be able
to see the website name and then the forum title. example: Red.Koala/Funny -->
<div id="user">[This is where the current users information will go]</div><!-- User information in the top right-->
<div id="list">[This is where a list of pop. forum topics will go.]</div>
<!-- List of popular forums located on the middle right of the page.-->
<div id="mforum">[This is where your main forum will go.]
<!-- Forum that you are currently on displayed in the big box in the middle of the screen.-->
<div id="post-bar"><!-- Section of the Forum section where any and all posting/commenting will be done -->
<form action="" onsubmit=""><!-- Assigned to whatever server space we select -->
<textarea id="text-box" type="text" placeholder="Post..."></textarea>
<!-- The box which the user will type in to post or comment things -->
<input id="post-button" type="submit" value="Post" required>
</form>
</div>
</div>
</div>
<div id="buttons"><!-- page numbers on display for the user to switch between pages.--> <!--currently not working as buttons-->
<table>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td> <!-- Each cell contains a different page number-->
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>...</td>
</tr>
</table>
</div>
</body>

Categories

Resources