Zope PostgreSQL variable with HTML and DTML - python

I have a postgresql db table called blog_post and in that table a column called post_main. That column stores the entire blog post article, including various HTML and DTML tags.
For reference (and yes, I know it's old), this is Zope 2.13 with PostgreSQL 8.1.19
For example:
<p>This is paragraph 1</p>
<dtml-var "blog.sitefiles.post.postimg1(_.None, _)">
<p>This is paragraph 2</p>
The dtml-var tag is telling Zope to insert the contents of the dtml-document postimg1 between the two paragraphs.
OK, no problem. I am storing this data without issue in the postgres db table, exactly as it was entered, and I am running a ZSQL Method via a <dtml-in zsqlmethod> tag that surrounds the entire dtml-document, in order to be able to call to the variables I need in the page.
Normally, and without either HTML code OR especially without DTML tags, it's no issue to insert the data into the web page. You do this via &dtml-varname; if you have no html tags and just want a plain text output, OR you do <dtml-var varname> if you want the data to be rendered and shown as proper html.
Here's the problem
Zope is just posting the <dtml-var "blog.sitefiles.post.postimg1(_.None, _)"> line to the html page instead of processing it like when I type it into the dtml-doc directly.
What I need:
I need the code stored in the post_main column (referenced above as varname) to be processed as if I typed it directly into the dtml-document, so that the <dtml-var> tags work the way they are supposed to work.

So, you have a variable that contains a DTML Document, and you want to execute that document and insert the results?
To be honest, I'm not sure that's possible in DTML alone, as it generally users don't want to execute code contained in strings. This is the same danger as exposing eval() or exec() of user supplied strings, as if someone can control the string they have arbitrary code execution on the Zope instance. It's the equivalent of storing PHP code in your database and executing that.
Frankly, I'm surprised you're using DTML on Zope 2.13 at all, rather than PageTemplates, but I assume you've got a good reason for it.
If you want to interpret the value of a DTML variable rather than just insert it, you'll need to explicitly do the interpreting, using something like:
from DocumentTemplate.DT_HTML import HTML
return HTML(trusted_dtml_string)
The problem with this is that you can't do it in a Script (Python) through the web, because of the security concerns. If you do this as an external method or filesystem code it's very likely that you'll allow arbitrary code execution on your server.
I'm afraid my only recommendation is to avoid doing this, it's very difficult to get it right and errors can be catastrophic. I'd strongly suggest you do not store DTML tags as part of your blog articles.
As an alternative, if you have a fixed number of delegations to DTML methods, I recommend writing a Python script, such as:
## Script (Python) "parse_variables"
##bind container=container
##bind context=context
##bind namespace=
##bind script=script
##bind subpath=traverse_subpath
##parameters=post, _
##title=
##
post = post.replace("##POST_IMAGE##", context.postimg(None, _))
return post
And then calling that with your variable that contains the user-supplied data, like <dtml-var expr="parse_variables(data, _)">

Related

How to dinamically inject HTML code in Django

In a project of mine I need to create an online encyclopedia. In order to do so, I need to create a page for each entry file, which are all written in Markdown, so I have to covert it to HTML before sending them to the website. I didn't want to use external libraries for this so I wrote my own python code that receives a Markdown file and returns a list with all the lines already formatted in HTML. The problem now is that I don't know how to inject this code to the template I have in Django, when I pass the list to it they are just printed like normal text. I know I could make my function write to an .html file but I don't think it's a great solution thinking about scalability.
Is there a way to dynamically inject HTML in Django? Is there a "better" approach to my problem?
You could use the safe filter in your template! So it would look like that.
Assuming you have your html in a string variable called my_html then in your template just write
{{ my_html | safe }}
And don’t forget to import it!

Best way to store web page content in database using Django and a single template for web pages

I'm building a web site and the bulk of the content will be the same general type and layout on the page. I'm going to use a single template to handle each post and the actual content will be stored in a database.
The content will just be html paragraphs, headers, sub headers, different lists, quotes, code blocks, etc.
Web pages will typically be the same or at least similar. All html components should follow the same guidelines to make sure everything looks and feels the same. Currently I'll be the only author, but in the future I plan to incorporate other authors as well.
At first I thought, just copy and paste this html content into a textfield in the database and I can add new posts/articles on the admin site.
Then I thought, maybe use a textfield and copy and paste json of a list of ['type': , 'content': ]. and then I can have the single template page iterate over this list and display the content based on the 'type'. My idea here is that it would shorten the data I have to add to the database by stripping the html tags out of the equation.
Considering I hope to have future authors as well, just curious of some ideas on how I can accomplish this to make it easy for myself to post new content.
That sounds pretty much exactly like the example of this fantastic tutorial by Miguel Grinberg. He sets up a flask environment to be used as his personal blog. With user log in and everything you would need.

How do you include flask/jinja2 code inside a markdown file?

I am using a markdown editor which is converted by
post_body = markdown(text_from_markdown_editor)
but when i render the html, the actual jinja2 code is displayed
This is a post by {{ post.author }}
instead of the actual value.
I've been seeing this issue come up a lot lately in various different places, both in relation to Jinja and Django Templates. There seems to be a fundamental misunderstanding (among some users) about how templates systems work and how that relates to Markdown text which is rendered to HTML and inserted into a template. I’ll try to explain this clearly. Note that while the answer below applies to most templating systems (including Jinja and Django), the examples use Jinja for illustrative purposes (after all, the original question specifically asks about Jinja). Simply adapt the code to match the API of your templating system of choice, and it should work just as well.
First of all, Markdown has no knowledge of template syntax. In fact, Markdown has been around longer than Jinja, Django or various other popular templating systems. Additionally, the Markdown Syntax Rules make no mention of template syntax. Therefore, your template syntax will not be processed simply by passing some Markdown text which contains template syntax through a Markdown parser. The template syntax needs to be processed separately by the template engine. For example:
from jinja2 import Environment
# Set up a new template environment
env = Environment()
# Create template with the markdown source text
template = env.from_string(text_from_markdown_editor)
# Render that template. Be sure to pass in the context (post in this instance).
template_processed_markdown = template.render(post=post)
# Now pass the Markdown text through the Markdown engine:
post_body = markdown(template_processed_markdown)
Note that the above first processes the template syntax, then parses the Markdown. In other words, the output of the template processing is still Markdown text with the tags replaced by the appropriate values. Only in the last line is the Markdown text converted to HTML by the Markdown parser. If you want the order of processing to be reversed, you will need to switch the code around to run the Markdown parser first and then pass the output of that through the template processor.
I assume that some of the confusion comes from people passing the Markdown text through a templating system. Shouldn’t that cause the template syntax to get processed? In short, No.
At its core, a templating system takes a template and a context. It then finds the various tags in the template and replaces those tags with the matching data provided in the context. However, the template has no knowledge about the data in the context and does no processing of that data. For example, this template:
Hello, {{ name }}!
And this context:
output = template(name='John')
Would result in the following output:
Hello, John!
However, if the context was this instead:
output = template(name='{(some_template_syntax)}')
then the output would be:
Hello, {{some_template_syntax}}!
Note that while the data in the context contained template syntax, the template did not process that data. It simply considered it a value and inserted it as-is into the template in the appropriate location. This is normal and correct behavior.
Sometimes however, you may have a legitimate need for a template to do some additional processing on some data passed to the template. For that reason, the template system offers filters. When given a variable in the context, the filter will process the data contained in that variable and then insert that processed data in the template. For example, to ensure that the name in our previous example is capitalized, the template would look like the following:
Hello, {{ name|capatalize }}!
Passing in the context output = template(name='john') (note that the name is lowercase), we then get the following output”
Hello, John!
Note, that the data in the name variable was processed by having the first letter capitalized, which is the function of Jinja’s built-in filter capitalize. However, that filter does not process template syntax, and therefore passing template syntax to that filter will not cause the template syntax to be processed.
The same concept applies to any markdown filter. Such a filter only parses the provided data as Markdown text and returns HTML text which is then placed into the template. No processing of template syntax would happen in such a scenario. In fact, doing so could result in a possible security issue, especially if the Markdown text is being provided by untrusted users. Therefore, any Markdown text which contains template syntax must have the template syntax processed separately.
However, there is a note of caution. If you are writing documentation which includes examples of Template syntax in them as code blocks (like the Markdown source for this answer), the templating system is not going to know the difference and will process those tags just like any template syntax not in a code block. If the Markdown processing was done first, so that the resulting HTML was passed to the templating system, that HTML would still contain unaltered template syntax within the code blocks which would still be processed by the templating system. This is most likely not what is desired in either case. As a workaround, one could conceivably create some sort of Markdown Extension which would add syntax processing to the Markdown processor itself. However, the mechanism for doing so would differ depending on which Markdown processor one is using and is beyond the scope of this question/answer.

Convert Wikipedia/MediaWiki's code into HTML using python

I am trying to grab content from Wikipedia and use the HTML of the article. Ideally I would also like to be able to alter the content (eg, hide certain infoboxes etc).
I am able to grab page content using mwclient:
>>> import mwclient
>>> site = mwclient.Site('en.wikipedia.org')
>>> page = site.Pages['Samuel_Pepys']
>>> print page.text()
{{Redirect|Pepys}}
{{EngvarB|date=January 2014}}
{{Infobox person
...
But I can't see a relatively simple, lightweight way to translate this wikicode into HTML using python.
Pandoc is too much for my needs.
I could just scrape the original page using Beautiful Soup but that doesn't seem like a particularly elegant solution.
mwparserfromhell might help in the process, but I can't quite tell from the documentation if it gives me anything I need and don't already have.
I can't see an obvious solution on the Alternative Parsers page.
What have I missed?
UPDATE: I wrote up what I ended up doing, following the discussion below.
page="""<html>
your pretty html here
<div id="for_api_content">%s</div>
</html>"""
Now you can grab your raw content with your API and just call
generated_page = page%api_content
This way you can design any HTML you want and just insert the API content in a designed spot.
Those APIs that you are using are designed to return raw content so it's up to you to style how you want the raw content to be displayed.
UPDATE
Since you showed me the actual output you are dealing with I realize your dilemma. However luckily for you there are modules that already parse and convert to HTML for you.
There is one called mwlib that will parse the wiki and output to HTML, PDF, etc. You can install it with pip using the install instructions. This is probably one of your better options since it was created in cooperation between Wikimedia Foundation and PediaPress.
Once you have it installed you can use the writer method to do the dirty work.
def writer(env, output, status_callback, **kwargs): pass
Here are the docs for this module: http://mwlib.readthedocs.org/en/latest/index.html
And you can set attributes on the writer object to set the filetype (HTML, PDF, etc).
writer.description = 'PDF documents (using ReportLab)'
writer.content_type = 'application/pdf'
writer.file_extension = 'pdf'
writer.options = {
'coverimage': {
'param': 'FILENAME',
'help': 'filename of an image for the cover page',
}
}
I don't know what the rendered html looks like but I would imagine that it's close to the actual wiki page. But since it's rendered in code I'm sure you have control over modifications as well.
I would go with HTML parsing, page content is reasonably semantic (class="infobox" and such), and there are classes explicitly meant to demarcate content which should not be displayed in alternative views (the first rule of the print stylesheet might be interesting).
That said, if you really want to manipulate wikitext, the best way is to fetch it, use mwparserfromhell to drop the templates you don't like, and use the parse API to get the modified HTML. Or use the Parsoid API which is a partial reimplementation of the parser returning XHTML/RDFa which is richer in semantic elements.
At any rate, trying to set up a local wikitext->HTML converter is by far the hardest way you can approach this task.
The mediawiki API contains a (perhaps confusingly named) parse action that in effect renders wikitext into HTML. I find that mwclient's faithful mirroring of the API structure sometimes actually gets in the way. There's a good example of just using requests to call the API to "parse" (aka render) a page given its title.

Search index for flat HTML pages

I'm looking to add search capability into an existing entirely static website. Likely, the new search functionality itself would need to be dynamic, as the search index would need to be updated periodically (as people make changes to the static content), and the search results will need to be dynamically produced when a user interacts with it. I'd hope to add this functionality using Python, as that's my preferred language, though am open to ideas.
The Google Web Search API won't work in this case because the content being indexed is on a private network. Django haystack won't work for this case, as that requires that the content be stored in Django models. A tool called mnoGoSearch might be an option, as I think it can spider a website like Google does, but I'm not sure how active that project is anymore; the project site seems a bit dated.
I'm curious about using tools like Solr, ElasticSearch, or Whoosh, though I believe that those tools are only the indexing engine and don't handle the parsing of search content. Does anyone have any recommendations as to how one may index static html content for retrieving as a set of search results? Thanks for reading and for any feedback you have.
With Solr, you would write code that retrieves content to be indexed, parses out the target portions from the each item then sends it to Solr for indexing.
You would then interact with Solr for search, and have it return either the entire indexed document an ID or some other identifying information about the original indexed content, using that to display results to the user.

Categories

Resources