Generating XML/Feed for your Python Blog - python

I've been trying to add RSS feeds in my blog(webapp2 application - Jinja2 templates), this is what I have:
class XMLHandler(Handler):
def get(self):
posts = db.GqlQuery("SELECT * FROM Post WHERE is_draft = FALSE ORDER BY created DESC")
self.render("xmltemplate.xml", posts=posts, mimetype='application/xml')
xmltemplate.xml looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<channel>
<title>Blag</title>
<link>http://www.blagonudacity.appspot.com/</link>
<description>Yet Another Blag</description>
{%for post in posts %}
<entry>
<title>{{post.subject}}></title>
<link href="http://www.blagonudacity.appspot.com/post/{{ post.key().id()}}" rel="alternate" />
<updated>{{post.created}}</updated>
<author><name>Prakhar Srivastav</name></author>
<summary type="html"> {{ post.content }} </summary>
</entry>
{%endfor%}
</channel>
</feed>
What i'm getting in my browser when I migrate to the relevant page /feeds/all.atom.xml
is just a html page with the markup. It doesn't look like how XML pages look in browser. What am I doing wrong here? Here is the demo

I saw that the page is delivered with content type text/html, this could be one problem, i suggest you should set this to text/xml (more details can be found here.
Also it highly depends on the browser on how this is displayed, i guess you are using chrome (like me) where the link provided by you looks like a webpage, if you open it in firefox you will see the "live bookmark" styled page, however the entries don't show. I'm not sure if this is because of some problem with your markup or some problem with firefox and atom feeds.
The xml file itself seems to be ok (checked with w3 validator).
UPDATE: Ok, there seems to be something wrong with your atom XML (it is valid xml, as mentioned above) however it does not seem to be valid Atom data (according to the feed validator).
I tried to bookmark it in firefox and it does not show any entries (just like the preview mentioned above).
So i think you should take a look at the atom feed e.g. this and this could help.
I'm not really sure but when looking at your XML i think that you may have mixed up Atom and Rss a little.

Related

XML uses an external DTD for validation - XML parser is Python (lxml) and this parser cannot load the external DTD from the HTTPS side

I have another problem I'm desperate about.
I think there are many solutions to this problem, but I would like to know if my approach can be implemented somehow.
I have a XML file uses one external DTD and is defined with the XML DOCTYP.
The xml-file are parsed with Python (lxml). So it is possible to validate the different files automatically with the DTD's defined in the XML DOCTYP. I use an external DTD which can be accessed via internet address. But this internet site redirects every request to the HTTPS port. For this reason Python cannot access the external DTD.
Thanks to a friend of mine I was able to use an old, unused website that still runs on HTTP. The DTD on this stored website can be found and used by the parser.
Now for my question. Is it possible to use an external DTD with Python-lxml that is only accessible via a HTTPS server? Unfortunately I have no possibility to create an area on the server that uses the HTTP port.
I've already tried to get the external DTD via an HTTP request but it gets redirected to the HTTPS port.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE book PUBLIC "-//AA//Test//EN" "***">
<!-- <!DOCTYPE book PUBLIC "-//AA//Test//EN" "***"> -->
<book>
<book-meta>
<book-id pub-id-type="other">handbook</book-id>
<book-title-group Id="1">
<book-title name="Hallo">The NCBI Handbook</book-title>
</book-title-group>
</book-meta>
</book>
For completeness here is an example DTD.
<!ELEMENT book ANY>
<!ATTLIST book
Release CDATA "v0.0.1"
>
<!ELEMENT book-meta ANY> <!-- # related objects: 0 -->
<!ATTLIST book-meta
Value CDATA "Das ist eine Information"
>
<!ELEMENT book-id ANY> <!-- # related objects: 0 -->
<!ATTLIST book-id
pub-id-type CDATA #REQUIRED
>
<!ELEMENT book-title-group ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title-group
Id CDATA #IMPLIED
>
<!ELEMENT book-title ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title
name CDATA #REQUIRED
>
For parsing the XML files I use a python script with the library lxml.
Following is the test program.
import xml.etree.ElementTree as ET
import lxml
from lxml import etree
myParser = lxml.etree.XMLParser(attribute_defaults = True, dtd_validation = True, load_dtd =True, no_network = False)
xmlFile = lxml.etree.parse("XML_DTDValidation.xml", parser=myParser)
xmlFile.xinclude()
xmlFile.write("XML_DTDValidation_out.xml",method="xml",xml_declaration=True, encoding='utf-8',pretty_print=True)
I hope I could summarize my problem well and someone can help me.
This page describes some ways to work around this.
You can either:
set up an XML catalog (which you could use to store the DTD somewhere local)
create your own resolver class which either redirects the URL, or retrieves the DTD from somewhere else.

Python+Flask dynamic generated RSS feed is invalid

I'm trying to create an RSS feed for my Blog app at patife.com/rss/. The app is built on python with Flask. I tried creating a template that would dynamically generate the RSS with all entries.. but its not valid
i can't seem to convert date format to RFC-822 using JINJA functions. I was trying the function strfdate.
the entry actual content which gets inside the description tag isn't taking HTML very nicely.
This is the current code (i removed the link generator bc its working and i can't make posts with too many links)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Patife.com</title>
<link>http://www.patife.com/</link>
<description>Startups. I can't help myself.</description>
{% for entry in entries %}
<item>
<title>{{ entry.title_en }}</title>
<link>http://www.patife.com/entries/{{ entry.id }}</link>
<guid>http://www.patife.com/entries/{{ entry.id }}</guid>
<pubDate>{{ entry.date_created.strftime('') }}</pubDate>
<description>{{ entry.text_en|safe }}</description>
</item>
{% endfor %}
</channel>
</rss>
Your dates are invalid because you are using naive datetimes. They have no timezone information associated with the them. Most databases don't support timezone-aware values, so you'll either need to convert all of your naive datetimes to aware datetimes or just include the timezone in your template.
<pubDate>{{ entry.date_created.strftime('%a, %d %b %y %T') }} UTC</pubDate>
The reason the HTML isn't validating is that when you embed HTML in XML it gets treated as XML. RSS doesn't support arbitrary tags, so validation fails. XML allows to you embed unescaped values in a node by wrapping it in CDATA delimiters.
<description><![CDATA[{{ entry.text_en|safe }}]]></description>

Can we include html tags into <template> of AIML files

I am trying to build a basic bot with the help of AIML file. Where I can ask a question and matched pattern will be returned for that pattern. Sample pattern is this in my AIML file.
<category>
<pattern>TEST</pattern>
<template>
Hi..This is the testing pattern. How are u doing
</template>
</category>
I am using PyAIML package for integrating python with AIML. So, if I ask "test", I get the response as
"Hi..This is the testing pattern. How are u doing".
query --->test
Answer --> Hi..This is the testing pattern. How are u doing
But if I change my above pattern to
<category>
<pattern>TEST</pattern>
<template>
<html>
<b>Hi..This is the testing pattern. </b> How are u doing
</html>
</template>
</category>
Basically If I add html tags. Then my bot does not respond. It gives blank answer for "test". What can be the problem here? Here is my code in python
import aiml, sys
class Chat:
def main(self, query):
mybot = aiml.Kernel()
mybot.verbose(False)
mybot.learn('test.aiml')
chatReply = mybot.respond(query)
return chatReply
if __name__ == '__main__':
print Chat().main(sys.argv[1])
Also if html tags are not working because I am running the code in python interpreter, then how to test if it will work or not.
AIML files are extended from XML. The HTML you have added is actually being seen as part of the XML document.
To put HTML into a tag within an XML document, you could use XML CDATA section so that your HTML text is interpreted as purely textual data rather than XML tags.
<category>
<pattern>TEST</pattern>
<template><![CDATA[
<html>
<b>Hi..This is the testing pattern. </b> How are u doing
</html>]]>
</template>
</category>
This will then contain the HTML tags in the output.
I used a alternative for this. According to this wiki page, AIML can contain html tags. But with pyaiml, it didn't worked out. So I changed my AIML patterns as
<category>
<pattern>TEST</pattern>
<template>
((html))
((b))Hi..This is the testing pattern. ((/b)) How are u doing
((/html))
</template>
</category>
So in the response I will be replacing '((' with '<'. I know its not a proper solution, but its a alternative.

Generating RSS feed under Google App Engine

I want to provide rss feed under google app engine/python.
I've tried to use usual request handler and generate xml response. When I access the feed url directly, I can see the feed correctly, however, when I'm trying to subscribe to the feed in google reader, it says that
'The feed being requested cannot be found.'
I wonder whether this approach is right. I was considering using a static xml file and updating it by cron jobs. But while GAE doesn't support file i/o, this approach seems not going to work.
How to solve this? Thanks!
There're 2 solutions I suggest:
GAE-REST you can just add to your project and configure and it will make RSS for you but the project is old and no longer maintained.
Do like I do, use a template to write a list to and like this I could succeed generating RSS (GeoRSS) that can be read via google reader where template is:
<title>{{host}}</title>
<link href="http://{{host}}" rel="self"/>
<id>http://{{host}}/</id>
<updated>2011-09-17T08:14:49.875423Z</updated>
<generator uri="http://{{host}}/">{{host}}</generator>
{% for entity in entities %}
<entry>
<title><![CDATA[{{entity.title}}]]></title>
<link href="http://{{host}}/vi/{{entity.key.id}}"/>
<id>http://{{host}}/vi/{{entity.key.id}}</id>
<updated>{{entity.modified.isoformat}}Z</updated>
<author><name>{{entity.title|escape}}</name></author>
<georss:point>{{entity.geopt.lon|floatformat:2}},{{entity.geopt.lat|floatformat:2}}</georss:point>
<published>{{entity.added}}</published>
<summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">{{entity.text|escape}}</div>
</summary>
</entry>
{% endfor %}
</feed>
My handler is (you can also do this with python 2.7 as just a function outside a handler for a more minimal solution):
class GeoRSS(webapp2.RequestHandler):
def get(self):
start = datetime.datetime.now() - timedelta(days=60)
count = (int(self.request.get('count'
)) if not self.request.get('count') == '' else 1000)
try:
entities = memcache.get('entities')
except KeyError:
entity = Entity.all().filter('modified >',
start).filter('published =',
True).order('-modified').fetch(count)
memcache.set('entities', entities)
template_values = {'entities': entities, 'request': self.request,
'host': os.environ.get('HTTP_HOST',
os.environ['SERVER_NAME'])}
dispatch = 'templates/georss.html'
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Cache-Control'] = 'public,max-age=%s' \
% 86400
self.response.headers['Content-Type'] = 'application/rss+xml'
self.response.out.write(output)
I hope some of this works for you, both ways worked for me.
I have an Atom feed generator for my blog, which runs on AppEngine/Python. I use the Django 1.2 template engine to construct the feed. My template looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>
Note: if you use any of this, you'll need to create your own uuid to go into the id node.
The updated node should contain the time and date on which contents of the feed were last updated in rfc 3339 format. Fortunately, Python has a library to take care of this for you. An excerpt from the controller that generates the feed:
from rfc3339 import rfc3339
posts = Post.get_all_posts()
self.context['posts'] = posts
# Initially, we'll assume that there are no posts in the blog and provide
# an empty date.
self.context['updated'] = ""
if posts is not None and len(posts) > 0:
# But there are posts, so we will pick the most recent one to get a good
# value for updated.
self.context['updated'] = rfc3339(posts[0].updated(), utc=True)
response.content_type = "application/atom+xml"
Don't worry about the self.context['updated'] stuff. That just how my framework provides a shortcut for setting template variables. The import part is that I encode the date that I want to use with the rfc3339 function. Also, I set the content_type property of the Response object to be application/atom+xml.
The only other missing piece is that the template uses a method called to_atom to turn the Post object into Atom-formatted data:
def to_atom(self):
"Create an ATOM entry block to represent this Post."
from rfc3339 import rfc3339
url_for = self.url_for()
atom_out = "<entry>\n\t<title>%s</title>\n\t<link href=\"http://blog.adamcrossland.net/%s\" />\n\t<id>%s</id>\n\t<summary>%s</summary>\n\t<updated>%s</updated>\n </entry>" % (self.title, url_for, self.slug_text, self.summary_for(), rfc3339(self.updated(), utc=True))
return atom_out
That's all that is required as far as I know, and this code does generate a perfectly-nice and working feed for my blog. Now, if you really want to do RSS instead of Atom, you'll need to change the format of the feed template, the Post template and the content_type, but I think that is the essence of what you need to do to get a feed generated from an AppEngine/Python application.
There's nothing special about generating XML as opposed to HTML - provided you set the content type correctly. Pass your feed to the validator at http://validator.w3.org/feed/ and it will tell you what's wrong with it.
If that doesn't help, you'll need to show us your source - we can't debug your code for you if you won't show it to us.

Exporting data as an XML file in google appengine

I'm trying to export data to an XML file in the Google appengine, I'm using Python/Django. The file is expected to contain upto 100K records converted to XML. Is there an equivalent in App Engine of:
f = file('blah', 'w+')
f.write('whatever')
f.close()
?
Thanks
Edit
What I'm trying to achieve is exporting some information to an XML document so it can be exported to google places (don't know exactly how this will work, but I've been told that google will fecth this xml file from time to time).
You could also generate XML with Django templates. There's no special reason that a template has to contain HMTL. I use this approach for generating the Atom feed for my blog. The template looks like this. I pass it the collection of posts that go into the feed, and each Post entity has a to_atom method that generate its Atom representation.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>
Every datastore model class has an instance method to_xml() that will generate an XML representation of that datastore type.
Run your query to get the records you want
Set the content type of the response as appropriate - if you want to prompt the user to save the file locally, add a content-disposition header as well
generate whatever XML preamble you need to come before your record data
iterate through the query results, calling to_xml() on each and adding that output to your reponse
do whatever closing of the XML preamble you need to do.
What the author is talking about is probably Sitemaps.
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
And about what I think you need is to write the XML to request object like so:
doc.writexml(self.response.out)
In my case I do this based on mime types sent from the client:
_MIME_TYPES = {
# xml mime type needs lower priority, that's needed for WebKit based browsers,
# which add application/xml equally to text/html in accept header
'xml': ('application/xml;q=0.9', 'text/xml;q=0.9', 'application/x-xml;q=0.9',),
'html': ('text/html',),
'json': ('application/json',),
}
mime = self.request.accept.best_match(reduce(lambda x, y: x + y, _MIME_TYPES.values()))
if mime:
for shortmime, mimes in _MIME_TYPES.items():
if mime in mimes:
renderer = shortmime
break
# call specific render function
renderer = 'render' + renderer
logging.info('Using %s for serving response' % renderer)
try:
getattr(self.__class__, renderer)(self)
except AttributeError, e:
logging.error("Missing renderer %s" % renderer)

Categories

Resources