So I have code that generates an SVG xml:
def get_map(locations):
max_value = max(locations.iteritems(), key=operator.itemgetter(1))[1]
base = "fill:#%s;fill-rule:nonzero;"
f = open("../static/img/usaHigh.svg", "r").read()
soup = BeautifulSoup(f)
for l in locations:
#do stuff.....
return str(soup)
but now I want to serve this svg through flask. To be able to do something like
<img src='sometemp.svg'>
so my flask function would look like:
def serve_content():
return render_template('sometemplate.html', map=get_map())
Is this possible without creating temp files?
EDIT: here's a chunk of the output:
<?xml version='1.0' encoding='utf-8'?>
<!-- (c) ammap.com | SVG map of USA -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:amcharts="http://amcharts.com/ammap" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
<defs>
<style type="text/css">
.land
{
fill: #CCCCCC;
fill-opacity: 1;
stroke:white;
stroke-opacity: 1;
stroke-width:0.5;
}
</style>
<!--{id:"US-AK"},{id:"US-AL"},{id:"US-AR"},{id:"US-AZ"},{id:"US-CA"},{id:"US-CO"},{id:"US-CT"},{id:"US-DC"},{id:"US-DE"},{id:"US-FL"},{id:"US-GA"},{id:"US-HI"},{id:"US-IA"},{id:"US-ID"},{id:"US-IL"},{id:"US-IN"},{id:"US-KS"},{id:"US-KY"},{id:"US-LA"},{id:"US-MA"},{id:"US-MD"},{id:"US-ME"},{id:"US-MI"},{id:"US-MN"},{id:"US-MO"},{id:"US-MS"},{id:"US-MT"},{id:"US-NC"},{id:"US-ND"},{id:"US-NE"},{id:"US-NH"},{id:"US-NJ"},{id:"US-NM"},{id:"US-NV"},{id:"US-NY"},{id:"US-OH"},{id:"US-OK"},{id:"US-OR"},{id:"US-PA"},{id:"US-RI"},{id:"US-SC"},{id:"US-SD"},{id:"US-TN"},{id:"US-TX"},{id:"US-UT"},{id:"US-VA"},{id:"US-VT"},{id:"US-WA"},{id:"US-WI"},{id:"US-WV"},{id:"US-WY"}-->
</defs>
<g>
<path id="US-AK" title="Alaska" class="land" d="M456.18,521.82l-0.1,4.96l-0.1,4.94l-0.1,4.92l-0.1,4.9l-0.1,4.88l-0.1,4.86l-0.1,4.84l-0.1,4.82l-0.1,4.8l-0.1,4.78l-0.1,4.77l-0.09,4.75l-0.1,4.73l-0.09,4.71l-0.09,4.7l-0.09,4.68l-0.09,4.66l-0.09,4.65l-0.09,4.64l-0.09,4.62l-0.09,4.61l-0.09,4.59l-0.09,4.58l-0.09,4.56l-0.09,4.55l-0.09,4.54l-0.09,4.53l-0.09,4.51l-0.09,4.5l-0.09,4.49l-0.09,4.48l-0.09,4.47l1.8,0.66l1.79,0.65l0.57,-1.23l1.93,0.97l1.69,0.85l1.09,-1.06l1.18,-1.14l1.58,-0.07l1.77,-0.09l1.18,-0.06l0,0.98l-0.44,1.63l-
So I'm trying to pipe that back to the page template without first saving it into a file, and then serving it. Is that possible?
EDIT 2:
this is the routing function
#app.route("/map", methods=["GET", "POST"])
def map():
locations = {"US-NY":60,
"US-FL":30,
"US-CA":100}
#return Response(get_map(locations), mimetype='image/svg+xml')
return render_template("map.html", svg=get_map(locations))
This is my template:
<html>
<body>
this is some of the stuff I want, here's a beautiful map
{{ svg }}
</body>
</html>
And this is what my page looks like on Chrome
So it's not actually showing the svg :'(
Your SVG code is being autoescaped. An appropriate solution is to call Markup(), like so:
return render_template("map.html", svg=Markup(get_map(locations)))
You can use StringIO for this :
from flask import send_file
from StringIO import StringIO
def serve_content(content):
svg_io = StringIO()
svg_io.write(content)
svg_io.seek(0)
return send_file(svg_io, mimetype='image/svg+xml')
For those with no need for an actual file, here's how I make an SVG from scratch with a custom color and serve it up with Flask.
import flask
custom_color_global_variable = 'red'
#app.route('/circle-thin-custom-color.svg', methods=('GET', 'HEAD'))
def circle_thin_custom_color):
"""Thin circle with the color set by a global variable."""
return flask.Response(
"""
<svg
xmlns="http://www.w3.org/2000/svg"
viewBox="0 0 512 512"
><path
fill="{color}"
d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111
248-248S393 8 256 8zm216 248c0 118.7-96.1 216-216 216-118.7
0-216-96.1-216-216 0-118.7 96.1-216 216-216 118.7 0 216 96.1
216 216z"
/></svg>
\n""".format(
color=custom_color_global_variable,
),
mimetype='image/svg+xml'
)
Related
I'm finishing an application by django that generates pdf certificates
it was working great before I pull it into heroke and set up S3 amazom to host static files.
I have an html with the certificate template, and by HTML2PDF I render it to pdf. But it's not showing the background by css, it works just in the tag
Weird that if we opent the image url in s3 amazom it's shown perfectly
here the template css part
<meta charset="utf-8" />
{% load static %}
<style type="text/css">
#page {
size: 1122.52px 1587.4px ;
/*size: A4 landscape;*/
margin: 0cm;
background-image: url({{bg_front}});
height: 1588;
}
</style>
My view:
class ViewPDF(View):
def get(self, request, *args, **kwargs):
data = {}
pdf = True
if kwargs['pk']:
try:
participant = Participant.objects.get(pk=kwargs['pk'])
print(participant.cpf)
if participant.name:
certificate = Certificate.objects.get(pk=participant.certificate.pk)
pathBack = str(certificate.template.template_back.url)
pathFront = str(certificate.template.template_front.url)
print(pathFront)
#
# CONFIGURA OS BACKGROUNDS E TEXTO
#
data['bg_front'] = pathFront
data['bg_back'] = pathBack
setting = certificate.template.settings
start_date = datetime.strftime(certificate.start_date,'%d/%m/%Y')
end_date = datetime.strftime(certificate.start_date,'%d/%m/%Y')
data['text_front'] = setting.replace('<<nome>>',participant.name).replace('<<cpf>>',str(participant.cpf)).replace('<<ch>>',str(certificate.ch)).replace('<<instituicao>>',str(certificate.institution)).replace('<<DataInicio>>',start_date).replace('<<DataFim>>',end_date)
data['cpf'] = participant.cpf
pdf = render_to_pdf('app_certificates/body_front_pdf.html', data)
return HttpResponse(pdf, content_type='application/pdf')
except TypeError as e:
return HttpResponse(e)
Is there a way, using lxml, to insert XML attributes with the right namespace?
For instance, I want to use XLink to insert links in a XML document. All I need to do is to insert {http://www.w3.org/1999/xlink}href attributes in some elements. I would like to use xlink prefix, but lxml generates prefixes like "ns0", "ns1"…
Here is what I tried:
from lxml import etree
#: Name (and namespace) of the *href* attribute use to insert links.
HREF_ATTR = etree.QName("http://www.w3.org/1999/xlink", "href").text
content = """\
<body>
<p>Link to <span>StackOverflow</span></p>
<p>Link to <span>Google</span></p>
</body>
"""
targets = ["https://stackoverflow.com", "https://www.google.fr"]
body_elem = etree.XML(content)
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
The dump looks like this:
<body>
<p>link to <span xmlns:ns0="http://www.w3.org/1999/xlink"
ns0:href="https://stackoverflow.com">stackoverflow</span></p>
<p>link to <span xmlns:ns1="http://www.w3.org/1999/xlink"
ns1:href="https://www.google.fr">google</span></p>
</body>
I found a way to factorize the namespaces by inserting and deleting an attribute in the root element, like this:
# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]
targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
It's ugly, but it works and I only need to do it one time. I get:
<body xmlns:ns0="http://www.w3.org/1999/xlink">
<p>Link to <span ns0:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span ns0:href="https://www.google.fr">Google</span></p>
</body>
But the problem remains: how can I turn this "ns0" prefix into "xlink"?
Using register_namespace as suggested by #mzjn:
etree.register_namespace("xlink", "http://www.w3.org/1999/xlink")
# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]
targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
The result is what I expected:
<body xmlns:xlink="http://www.w3.org/1999/xlink">
<p>Link to <span xlink:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span xlink:href="https://www.google.fr">Google</span></p>
</body>
I currently have an HTML file and a python file. The python file uses YELP's API and returns JSON data. How do I display that data onto my webpage through HTML? Is there a function like document.getElementById("id").innerHTML = JSONDATA from JavaScript?
Please let me know if you need any more details; this is my first time posting and first time using an API/making a website. I understand the JSON data is not going to look nice but I will put it into a dictionary and sort it later, basically right now I am just wondering how to display data from a Python file into a HTML file. Also, feel free to link any helpful tutorials.
Found the following Node.js code as it was suggested to use Javascript instead, where in this would I put my tokens/secrets? And then how would I call it in my html file... Thank you.
/* require the modules needed */
var oauthSignature = require('oauth-signature');
var n = require('nonce')();
var request = require('request');
var qs = require('querystring');
var _ = require('lodash');
/* Function for yelp call
* ------------------------
* set_parameters: object with params to search
* callback: callback(error, response, body)
*/
var request_yelp = function(set_parameters, callback) {
/* The type of request */
var httpMethod = 'GET';
/* The url we are using for the request */
var url = 'http://api.yelp.com/v2/search';
/* We can setup default parameters here */
var default_parameters = {
location: 'San+Francisco',
sort: '2'
};
/* We set the require parameters here */
var required_parameters = {
oauth_consumer_key : process.env.oauth_consumer_key,
oauth_token : process.env.oauth_token,
oauth_nonce : n(),
oauth_timestamp : n().toString().substr(0,10),
oauth_signature_method : 'HMAC-SHA1',
oauth_version : '1.0'
};
/* We combine all the parameters in order of importance */
var parameters = _.assign(default_parameters, set_parameters, required_parameters);
/* We set our secrets here */
var consumerSecret = process.env.consumerSecret;
var tokenSecret = process.env.tokenSecret;
/* Then we call Yelp's Oauth 1.0a server, and it returns a signature */
/* Note: This signature is only good for 300 seconds after the oauth_timestamp */
var signature = oauthSignature.generate(httpMethod, url, parameters, consumerSecret, tokenSecret, { encodeSignature: false});
/* We add the signature to the list of paramters */
parameters.oauth_signature = signature;
/* Then we turn the paramters object, to a query string */
var paramURL = qs.stringify(parameters);
/* Add the query string to the url */
var apiURL = url+'?'+paramURL;
/* Then we use request to send make the API Request */
request(apiURL, function(error, response, body){
return callback(error, response, body);
});
};
I had a similar situation. I had to show the IAM users of AWS account in a HTML page. I used AWS boto3 Python client to grab all IAM users and write a JSON file. Then from HTML file I read that JSON file and showed all users in a table.
Here is the Python code IAM.PY:
import boto3
import os
import subprocess
import json
iam_client = boto3.client('iam')
def list_user_cli():
list_cmd = "aws iam list-users"
output = subprocess.check_output(list_cmd, shell = True)
output = str(output.decode('ascii'))
return output
def write_json_file(filename, data):
try:
with open(filename, "w") as f:
f.writelines(data)
print(filename + " has been created.")
except Exception as e:
print(str(e))
if __name__ == "__main__":
filename = "iam.json"
data = list_user_cli()
write_json_file(filename, data)
Here is the HTML file IAM.HTML:
<!DOCTYPE html>
<html>
<head>
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<title>IAM User List</title>
<style type="text/css">
body{
margin: 20px;
}
</style>
</head>
<body>
<div class="container">
<table class="table table-responsive table-hover table-bordered">
<thead>
<tr>
<th>User ID</th>
<th>User Name</th>
<th>Path</th>
<th>Create Date</th>
<th>Arn</th>
</tr>
</thead>
<tbody id="iam_tbody">
</tbody>
</table>
</div>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.0/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
$.ajax({
method: "GET",
url: "http://localhost/iam/iam.json",
}).done(function(response){
user_list = response.Users;
for(i = 0; i<user_list.length; i++){
tr = "<tr>";
tr += "<td>";
tr += user_list[i]["UserId"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["UserName"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["Path"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["CreateDate"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["Arn"];
tr += "</td>";
tr += "<tr>";
$("#iam_tbody").append(tr);
}
});
});
</script>
</body>
</html>
Output
You can use Jquery Ajax to call your API, include Jquery in your html file.
$.ajax({
method: "GET",
url: "api_url",
}).done(function( response ) {
$('#divId').append(response);
});
In Your Html File
<div id="divId"></div>
Jquery Ajax Documentation
I have configured xampp on windows to work with python 2.7 and Pygments. My php code is highlighted properly in Pygments on the website. The code has colors, span elements, classes.
That is how it looks:
But I cannot get line numbers.
As I have read tutorials it depends on the linenos value in python script. The value should be either table or inline or 1 or True.
But it does not work for me. I still gives the same final code
<!doctype html>
<html lang="pl">
<head>
<meta charset="UTF-8">
<title>Document</title>
<link rel="stylesheet" href="gh.css">
</head>
<body>
<div class="highlight highlight-php"><pre><code><span class="nv">$name</span> <span class="o">=</span> <span class="s2">"Jaś"</span><span class="p">;</span>
<span class="k">echo</span> <span class="s2">"Zażółć gęślą jaźń, "</span> <span class="o">.</span> <span class="nv">$name</span> <span class="o">.</span> <span class="s1">'.'</span><span class="p">;</span>
<span class="k">echo</span> <span class="s2">"hehehe#jo.io"</span><span class="p">;</span>
</code></pre></div>
</html>
How to add line numbers? I put two files of the website below:
index.py
import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter
# If there isn't only 2 args something weird is going on
expecting = 2;
if ( len(sys.argv) != expecting + 1 ):
exit(128)
# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2]
f = open(filename, 'rb')
code = f.read()
f.close()
# PHP
if language == 'php':
from pygments.lexers import PhpLexer
lexer = PhpLexer(startinline=True)
# GUESS
elif language == 'guess':
from pygments.lexers import guess_lexer
lexer = guess_lexer( code )
# GET BY NAME
else:
from pygments.lexers import get_lexer_by_name
lexer = get_lexer_by_name( language )
# OUTPUT
formatter = HtmlFormatter(linenos='table', encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted
index.php
<?php
define('MB_WPP_BASE', dirname(__FILE__));
function mb_pygments_convert_code($matches)
{
$pygments_build = MB_WPP_BASE . '/index.py';
$source_code = isset($matches[3]) ? $matches[3] : '';
$class_name = isset($matches[2]) ? $matches[2] : '';
// Creates a temporary filename
$temp_file = tempnam(sys_get_temp_dir(), 'MB_Pygments_');
// Populate temporary file
$filehandle = fopen($temp_file, "w");
fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8'));
fclose($filehandle);
// Creates pygments command
$language = $class_name ? $class_name : 'guess';
$command = sprintf('C:\Python27/python %s %s %s', $pygments_build, $language, $temp_file);
// Executes the command
$retVal = -1;
exec($command, $output, $retVal);
unlink($temp_file);
// Returns Source Code
$format = '<div class="highlight highlight-%s"><pre><code>%s</code></pre></div>';
if ($retVal == 0)
$source_code = implode("\n", $output);
$highlighted_code = sprintf($format, $language, $source_code);
return $highlighted_code;
}
// This prevent throwing error
libxml_use_internal_errors(true);
// Get all pre from post content
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding('
<pre class="php">
<code>
$name = "Jaś";
echo "Zażółć gęślą jaźń, " . $name . \'.\';
echo "<address>hehehe#jo.io</address>";
</code>
</pre>', 'HTML-ENTITIES', "UTF-8"), LIBXML_HTML_NODEFDTD);
$pres = $dom->getElementsByTagName('pre');
foreach ($pres as $pre) {
$class = $pre->attributes->getNamedItem('class')->nodeValue;
$code = $pre->nodeValue;
$args = array(
2 => $class, // Element at position [2] is the class
3 => $code // And element at position [2] is the code
);
// convert the code
$new_code = mb_pygments_convert_code($args);
// Replace the actual pre with the new one.
$new_pre = $dom->createDocumentFragment();
$new_pre->appendXML($new_code);
$pre->parentNode->replaceChild($new_pre, $pre);
}
// Save the HTML of the new code.
$newHtml = "";
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $child) {
$newHtml .= $dom->saveHTML($child);
}
?>
<!doctype html>
<html lang="pl">
<head>
<meta charset="UTF-8">
<title>Document</title>
<link rel="stylesheet" href="gh.css">
</head>
<body>
<?= $newHtml ?>
</body>
</html>
Thank you
While reading the file try readlines:
f = open(filename, 'rb')
code = f.readlines()
f.close()
This way you do the following it will get multiple lines :
formatter = HtmlFormatter(linenos='table', encoding='utf-8', nowrap=True)
Suggestion:
More pythonic way of opening files is :
with open(filename, 'rb') as f:
code = f.readlines()
That's it python context manager closes this file for you.
Solved!
nowrap
If set to True, don’t wrap the tokens at all, not even inside a tag. This disables most other options (default: False).
http://pygments.org/docs/formatters/#HtmlFormatter
I have lots of HTML files. I want to replace some elements, keeping all the other content unchanged. For example, I would like to execute this jQuery expression (or some equivalent of it):
$('.header .title').text('my new content')
on the following HTML document:
<div class=header><span class=title>Foo</span></div>
<p>1<p>2
<table><tr><td>1</td></tr></table>
and have the following result:
<div class=header><span class=title>my new content</span></div>
<p>1<p>2
<table><tr><td>1</td></tr></table>
The problem is, all parsers I’ve tried (Nokogiri, BeautifulSoup, html5lib) serialize it to something like this:
<html>
<head></head>
<body>
<div class=header><span class=title>my new content</span></div>
<p>1</p><p>2</p>
<table><tbody><tr><td>1</td></tr></tbody></table>
</body>
</html>
E.g. they add:
html, head and body elements
closing p tags
tbody
Is there a parser that satisfies my needs? It should work in either Node.js, Ruby or Python.
I highly recommend the pyquery package, for python. It is a jquery-like interface layered ontop of the extremely reliable lxml package, a python binding to libxml2.
I believe this does exactly what you want, with a quite familiar interface.
from pyquery import PyQuery as pq
html = '''
<div class=header><span class=title>Foo</span></div>
<p>1<p>2
<table><tr><td>1</td></tr></table>
'''
doc = pq(html)
doc('.header .title').text('my new content')
print doc
Output:
<div><div class="header"><span class="title">my new content</span></div>
<p>1</p><p>2
</p><table><tr><td>1</td></tr></table></div>
The closing p tag can't be helped. lxml only keeps the values from the original document, not the vagaries of the original. Paragraphs can be made two ways, and it chooses the more standard way when doing serialization. I don't believe you'll find a (bug-free) parser that does better.
Note: I'm on Python 3.
This will only handle a subset of CSS selectors, but it may be enough for your purposes.
from html.parser import HTMLParser
class AttrQuery():
def __init__(self):
self.repl_text = ""
self.selectors = []
def add_css_sel(self, seltext):
sels = seltext.split(" ")
for selector in sels:
if selector[:1] == "#":
self.add_selector({"id": selector[1:]})
elif selector[:1] == ".":
self.add_selector({"class": selector[1:]})
elif "." in selector:
html_tag, html_class = selector.split(".")
self.add_selector({"html_tag": html_tag, "class": html_class})
else:
self.add_selector({"html_tag": selector})
def add_selector(self, selector_dict):
self.selectors.append(selector_dict)
def match_test(self, tagwithattrs_list):
for selector in self.selectors:
for condition in selector:
condition_value = selector[condition]
if not self._condition_test(tagwithattrs_list, condition, condition_value):
return False
return True
def _condition_test(self, tagwithattrs_list, condition, condition_value):
for tagwithattrs in tagwithattrs_list:
try:
if condition_value == tagwithattrs[condition]:
return True
except KeyError:
pass
return False
class HTMLAttrParser(HTMLParser):
def __init__(self, html, **kwargs):
super().__init__(self, **kwargs)
self.tagwithattrs_list = []
self.queries = []
self.matchrepl_list = []
self.html = html
def handle_starttag(self, tag, attrs):
tagwithattrs = dict(attrs)
tagwithattrs["html_tag"] = tag
self.tagwithattrs_list.append(tagwithattrs)
if debug:
print("push\t", end="")
for attrname in tagwithattrs:
print("{}:{}, ".format(attrname, tagwithattrs[attrname]), end="")
print("")
def handle_endtag(self, tag):
try:
while True:
tagwithattrs = self.tagwithattrs_list.pop()
if debug:
print("pop \t", end="")
for attrname in tagwithattrs:
print("{}:{}, ".format(attrname, tagwithattrs[attrname]), end="")
print("")
if tag == tagwithattrs["html_tag"]: break
except IndexError:
raise IndexError("Found a close-tag for a non-existent element.")
def handle_data(self, data):
if self.tagwithattrs_list:
for query in self.queries:
if query.match_test(self.tagwithattrs_list):
line, position = self.getpos()
length = len(data)
match_replace = (line-1, position, length, query.repl_text)
self.matchrepl_list.append(match_replace)
def addquery(self, query):
self.queries.append(query)
def transform(self):
split_html = self.html.split("\n")
self.matchrepl_list.reverse()
if debug: print ("\nreversed list of matches (line, position, len, repl_text):\n{}\n".format(self.matchrepl_list))
for line, position, length, repl_text in self.matchrepl_list:
oldline = split_html[line]
newline = oldline[:position] + repl_text + oldline[position+length:]
split_html = split_html[:line] + [newline] + split_html[line+1:]
return "\n".join(split_html)
See the example usage below.
html_test = """<div class=header><span class=title>Foo</span></div>
<p>1<p>2
<table><tr><td class=hi><div id=there>1</div></td></tr></table>"""
debug = False
parser = HTMLAttrParser(html_test)
query = AttrQuery()
query.repl_text = "Bar"
query.add_selector({"html_tag": "div", "class": "header"})
query.add_selector({"class": "title"})
parser.addquery(query)
query = AttrQuery()
query.repl_text = "InTable"
query.add_css_sel("table tr td.hi #there")
parser.addquery(query)
parser.feed(html_test)
transformed_html = parser.transform()
print("transformed html:\n{}".format(transformed_html))
Output:
transformed html:
<div class=header><span class=title>Bar</span></div>
<p>1<p>2
<table><tr><td class=hi><div id=there>InTable</div></td></tr></table>
Ok I have done this in a few languages and I have to say the best parser I have seen that preserves whitespace and even HTML comments is:
Jericho which is unfortunately Java.
That is Jericho knows how to parse and preserve fragments.
Yes I know its Java but you could easily make a RESTful service with a tiny bit of Java that would take the payload and convert it. In the Java REST service you could use JRuby, Jython, Rhino Javascript etc. to coordinate with Jericho.
You can use Nokogiri HTML Fragment for this:
fragment = Nokogiri::HTML.fragment('<div class=header><span class=title>Foo</span></div>
<p>1<p>2
<table><tr><td>1</td></tr></table>')
fragment.css('.title').children.first.replace(Nokogiri::XML::Text.new('HEY', fragment))
frament.to_s #=> "<div class=\"header\"><span class=\"title\">HEY</span></div>\n<p>1</p><p>2\n</p><table><tr><td>1</td></tr></table>"
The problem with the p tag persists, because it is invalid HTML, but this should return your document without html, head or body and tbody tags.
With Python - using lxml.html is fairly straight forward:
(It meets points 1 & 3, but I don't think much can be done about 2, and handles the unquoted class='s)
import lxml.html
fragment = """<div class=header><span class=title>Foo</span></div>
<p>1<p>2
<table><tr><td>1</td></tr></table>
"""
page = lxml.html.fromstring(fragment)
for span in page.cssselect('.header .title'):
span.text = 'my new value'
print lxml.html.tostring(page, pretty_print=True)
Result:
<div>
<div class="header"><span class="title">my new content</span></div>
<p>1</p>
<p>2
</p>
<table><tr><td>1</td></tr></table>
</div>
This is a slightly separate solution but if this is only for a few simple instances then perhaps CSS is the answer.
Generated Content
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<style type="text/css">
#header.title1:first-child:before {
content: "This is your title!";
display: block;
width: 100%;
}
#header.title2:first-child:before {
content: "This is your other title!";
display: block;
width: 100%;
}
</style>
</head>
<body>
<div id="header" class="title1">
<span class="non-title">Blah Blah Blah Blah</span>
</div>
</body>
</html>
In this instance you could just have jQuery swap the classes and you'd get the change for free with css. I haven't tested this particular usage but it should work.
We use this for things like outage messages.
If you're running a Node.js app, this module will do exactly what you want, a JQuery style DOM manipulator: https://github.com/cheeriojs/cheerio
An example from their wiki:
var cheerio = require('cheerio'),
$ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
$.html();
//=> <h2 class="title welcome">Hello there!</h2>