Generating RSS feed under Google App Engine - python

I want to provide rss feed under google app engine/python.
I've tried to use usual request handler and generate xml response. When I access the feed url directly, I can see the feed correctly, however, when I'm trying to subscribe to the feed in google reader, it says that
'The feed being requested cannot be found.'
I wonder whether this approach is right. I was considering using a static xml file and updating it by cron jobs. But while GAE doesn't support file i/o, this approach seems not going to work.
How to solve this? Thanks!

There're 2 solutions I suggest:
GAE-REST you can just add to your project and configure and it will make RSS for you but the project is old and no longer maintained.
Do like I do, use a template to write a list to and like this I could succeed generating RSS (GeoRSS) that can be read via google reader where template is:
<title>{{host}}</title>
<link href="http://{{host}}" rel="self"/>
<id>http://{{host}}/</id>
<updated>2011-09-17T08:14:49.875423Z</updated>
<generator uri="http://{{host}}/">{{host}}</generator>
{% for entity in entities %}
<entry>
<title><![CDATA[{{entity.title}}]]></title>
<link href="http://{{host}}/vi/{{entity.key.id}}"/>
<id>http://{{host}}/vi/{{entity.key.id}}</id>
<updated>{{entity.modified.isoformat}}Z</updated>
<author><name>{{entity.title|escape}}</name></author>
<georss:point>{{entity.geopt.lon|floatformat:2}},{{entity.geopt.lat|floatformat:2}}</georss:point>
<published>{{entity.added}}</published>
<summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">{{entity.text|escape}}</div>
</summary>
</entry>
{% endfor %}
</feed>
My handler is (you can also do this with python 2.7 as just a function outside a handler for a more minimal solution):
class GeoRSS(webapp2.RequestHandler):
def get(self):
start = datetime.datetime.now() - timedelta(days=60)
count = (int(self.request.get('count'
)) if not self.request.get('count') == '' else 1000)
try:
entities = memcache.get('entities')
except KeyError:
entity = Entity.all().filter('modified >',
start).filter('published =',
True).order('-modified').fetch(count)
memcache.set('entities', entities)
template_values = {'entities': entities, 'request': self.request,
'host': os.environ.get('HTTP_HOST',
os.environ['SERVER_NAME'])}
dispatch = 'templates/georss.html'
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Cache-Control'] = 'public,max-age=%s' \
% 86400
self.response.headers['Content-Type'] = 'application/rss+xml'
self.response.out.write(output)
I hope some of this works for you, both ways worked for me.

I have an Atom feed generator for my blog, which runs on AppEngine/Python. I use the Django 1.2 template engine to construct the feed. My template looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>
Note: if you use any of this, you'll need to create your own uuid to go into the id node.
The updated node should contain the time and date on which contents of the feed were last updated in rfc 3339 format. Fortunately, Python has a library to take care of this for you. An excerpt from the controller that generates the feed:
from rfc3339 import rfc3339
posts = Post.get_all_posts()
self.context['posts'] = posts
# Initially, we'll assume that there are no posts in the blog and provide
# an empty date.
self.context['updated'] = ""
if posts is not None and len(posts) > 0:
# But there are posts, so we will pick the most recent one to get a good
# value for updated.
self.context['updated'] = rfc3339(posts[0].updated(), utc=True)
response.content_type = "application/atom+xml"
Don't worry about the self.context['updated'] stuff. That just how my framework provides a shortcut for setting template variables. The import part is that I encode the date that I want to use with the rfc3339 function. Also, I set the content_type property of the Response object to be application/atom+xml.
The only other missing piece is that the template uses a method called to_atom to turn the Post object into Atom-formatted data:
def to_atom(self):
"Create an ATOM entry block to represent this Post."
from rfc3339 import rfc3339
url_for = self.url_for()
atom_out = "<entry>\n\t<title>%s</title>\n\t<link href=\"http://blog.adamcrossland.net/%s\" />\n\t<id>%s</id>\n\t<summary>%s</summary>\n\t<updated>%s</updated>\n </entry>" % (self.title, url_for, self.slug_text, self.summary_for(), rfc3339(self.updated(), utc=True))
return atom_out
That's all that is required as far as I know, and this code does generate a perfectly-nice and working feed for my blog. Now, if you really want to do RSS instead of Atom, you'll need to change the format of the feed template, the Post template and the content_type, but I think that is the essence of what you need to do to get a feed generated from an AppEngine/Python application.

There's nothing special about generating XML as opposed to HTML - provided you set the content type correctly. Pass your feed to the validator at http://validator.w3.org/feed/ and it will tell you what's wrong with it.
If that doesn't help, you'll need to show us your source - we can't debug your code for you if you won't show it to us.

Related

replace some html content with values

I want to send a verification email. The email consists of HTML. I saved the HTML into a file named email_templates/verify.html (path). The problem is, that there are some constants in the HTML file are unknown until runtime. For instance, in the email, I refer to the username to which I send my email, but since each email is referring to someone else, I can't include the name in the template. One solution that comes to mind is to use some formatting technique in the lines of
<div>
hello {usrname}!
<div>
and then in the python code do something like:
lines = open('email_templates/verify.html', 'r').read()
lines.format('joe')
But this code, although is, in fact, can work, has some issues:
every {} in the HTML file can be a mistake to be formatted
the code in the current form is not very readable
code is not elegant
for an HTML reader that don't know python the formatting placeholders will be confusing
Is there is any better way to approach this?
This can and should be done through templating.
As you mentioned that maybe python placeholders will be confusing but I tell you they are not confusing, templating engines make sure HTML looks like HTML and these template tags look like template tags. Templating engines lay down the rules which placeholders you can and can't use. Also they are way fast than the file opening method you suggested; because they are optimized to do so.
Let's understand by example:
There are several templating engines out there. Jinja2 is one of the best ones.
First, install Jinja2.
pip install jinja2
Second, create a python file(name it anything you want) and a folder named 'templates'. Under 'templates' folder create your verify.html
Your folder structure should look like this:
folder1
|
|--> pythonfile.py
|--> templates
|
|--> verify.html
Third, put some sample code in the HTML file. I have this example put in my verify.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Index</title>
</head>
<body>
<h1>Dear {{ user }}!</h1>
<h4>
Hope you are fine.
</h4>
<p>
Thank you for signing up. Here is your {{ coupon_code }}
</p>
</body>
</html>
Now in this html file you see I have normal html tags. But there are two sets of curly braces occurring twice. The word written inside the curly braces will be considered a variable by jinja. The value of this variable will be supplied by our python file to this html file.
Also, to be consistent, jinja doesn't allow you to just use any braces. I mean if I had put "<>" instead of "{{ }}" it would not have worked. So there are some rules to be followed.
Read more here: Jinja allowed tags and filters
Fourth, copy this code into the python file we created.
#Imports
from jinja2 import Environment, FileSystemLoader, Template
#name of the folder where index file is located.
file_loader = FileSystemLoader('templates')
#This object is needed to create a template object.
env = Environment(loader=file_loader)
#path of the HTML file reletive to the folder.
template = env.get_template('./index.html')
#Data dictionary to be supplied to our HTML file.
input_dict = {
'user': 'Harry',
'coupon_code': '12313ASDSA4'}
#This function renders the data substituted HTML form.
output = template.render(input_dict)
print(output)
Now run this python file.

Python+Flask dynamic generated RSS feed is invalid

I'm trying to create an RSS feed for my Blog app at patife.com/rss/. The app is built on python with Flask. I tried creating a template that would dynamically generate the RSS with all entries.. but its not valid
i can't seem to convert date format to RFC-822 using JINJA functions. I was trying the function strfdate.
the entry actual content which gets inside the description tag isn't taking HTML very nicely.
This is the current code (i removed the link generator bc its working and i can't make posts with too many links)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Patife.com</title>
<link>http://www.patife.com/</link>
<description>Startups. I can't help myself.</description>
{% for entry in entries %}
<item>
<title>{{ entry.title_en }}</title>
<link>http://www.patife.com/entries/{{ entry.id }}</link>
<guid>http://www.patife.com/entries/{{ entry.id }}</guid>
<pubDate>{{ entry.date_created.strftime('') }}</pubDate>
<description>{{ entry.text_en|safe }}</description>
</item>
{% endfor %}
</channel>
</rss>
Your dates are invalid because you are using naive datetimes. They have no timezone information associated with the them. Most databases don't support timezone-aware values, so you'll either need to convert all of your naive datetimes to aware datetimes or just include the timezone in your template.
<pubDate>{{ entry.date_created.strftime('%a, %d %b %y %T') }} UTC</pubDate>
The reason the HTML isn't validating is that when you embed HTML in XML it gets treated as XML. RSS doesn't support arbitrary tags, so validation fails. XML allows to you embed unescaped values in a node by wrapping it in CDATA delimiters.
<description><![CDATA[{{ entry.text_en|safe }}]]></description>

Generating XML/Feed for your Python Blog

I've been trying to add RSS feeds in my blog(webapp2 application - Jinja2 templates), this is what I have:
class XMLHandler(Handler):
def get(self):
posts = db.GqlQuery("SELECT * FROM Post WHERE is_draft = FALSE ORDER BY created DESC")
self.render("xmltemplate.xml", posts=posts, mimetype='application/xml')
xmltemplate.xml looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<channel>
<title>Blag</title>
<link>http://www.blagonudacity.appspot.com/</link>
<description>Yet Another Blag</description>
{%for post in posts %}
<entry>
<title>{{post.subject}}></title>
<link href="http://www.blagonudacity.appspot.com/post/{{ post.key().id()}}" rel="alternate" />
<updated>{{post.created}}</updated>
<author><name>Prakhar Srivastav</name></author>
<summary type="html"> {{ post.content }} </summary>
</entry>
{%endfor%}
</channel>
</feed>
What i'm getting in my browser when I migrate to the relevant page /feeds/all.atom.xml
is just a html page with the markup. It doesn't look like how XML pages look in browser. What am I doing wrong here? Here is the demo
I saw that the page is delivered with content type text/html, this could be one problem, i suggest you should set this to text/xml (more details can be found here.
Also it highly depends on the browser on how this is displayed, i guess you are using chrome (like me) where the link provided by you looks like a webpage, if you open it in firefox you will see the "live bookmark" styled page, however the entries don't show. I'm not sure if this is because of some problem with your markup or some problem with firefox and atom feeds.
The xml file itself seems to be ok (checked with w3 validator).
UPDATE: Ok, there seems to be something wrong with your atom XML (it is valid xml, as mentioned above) however it does not seem to be valid Atom data (according to the feed validator).
I tried to bookmark it in firefox and it does not show any entries (just like the preview mentioned above).
So i think you should take a look at the atom feed e.g. this and this could help.
I'm not really sure but when looking at your XML i think that you may have mixed up Atom and Rss a little.

how do i print outputs to html page using python?

I want user to enter a sentence then I break up that sentence into a list. I got the html page down but i have trouble passing that sentence to python.
How do I properly send the user input to be processed by python and output it to a new page?
There are many Python web frameworks. For example, to break up a sentence using bottle:
break-sentence.py:
#!/usr/bin/env python
from bottle import request, route, run, view
#route('/', method=['GET', 'POST'])
#view('form_template')
def index():
return dict(parts=request.forms.sentence.split(), # split on whitespace
show_form=request.method=='GET') # show form for get requests
run(host='localhost', port=8080)
And the template file form_template.tpl that is used both to show the form and the sentence parts after processing in Python (see index() function above):
<!DOCTYPE html>
<title>Break up sentence</title>
%if show_form:
<form action="/" method="post">
<label for="sentence">Input a sentence to break up</label>
<input type="text" name="sentence" />
</form>
%else:
Sentence parts:<ol>
%for part in parts:
<li> {{ part }}
%end
</ol>
%end
request.forms.sentence is used in Python to access user input from <input name="sentence"/> field.
To try it you could just download bottle.py and run:
$ python break-sentence.py
Bottle server starting up (using WSGIRefServer())...
Listening on http://localhost:8080/
Hit Ctrl-C to quit.
Now you can visit http://localhost:8080/.
Have you tried Google? This page sums up the possibilities, and is one of the first results when googling 'python html'.
As far as I know, the two easiest options for your problem are the following.
1) CGI scripting. You write a python script and configure it as a CGI-script (in case of most HTTP-servers by putting it in the cgi-bin/ folder). Next, you point to this file as the action-attribute of the form-tag in your HTML-file. The python-script will have access to all post-variables (and more), thus being able to process the input and write it as a HTML-file. Have a look at this page for a more extensive description. Googling for tutorials will give you easier step-by-step guides, such as this one.
2) Use Django. This is rather suited for larger projects, but giving it a try on this level may provide you certain insights, and wetting your appetite for future work ;)

Exporting data as an XML file in google appengine

I'm trying to export data to an XML file in the Google appengine, I'm using Python/Django. The file is expected to contain upto 100K records converted to XML. Is there an equivalent in App Engine of:
f = file('blah', 'w+')
f.write('whatever')
f.close()
?
Thanks
Edit
What I'm trying to achieve is exporting some information to an XML document so it can be exported to google places (don't know exactly how this will work, but I've been told that google will fecth this xml file from time to time).
You could also generate XML with Django templates. There's no special reason that a template has to contain HMTL. I use this approach for generating the Atom feed for my blog. The template looks like this. I pass it the collection of posts that go into the feed, and each Post entity has a to_atom method that generate its Atom representation.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>
Every datastore model class has an instance method to_xml() that will generate an XML representation of that datastore type.
Run your query to get the records you want
Set the content type of the response as appropriate - if you want to prompt the user to save the file locally, add a content-disposition header as well
generate whatever XML preamble you need to come before your record data
iterate through the query results, calling to_xml() on each and adding that output to your reponse
do whatever closing of the XML preamble you need to do.
What the author is talking about is probably Sitemaps.
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
And about what I think you need is to write the XML to request object like so:
doc.writexml(self.response.out)
In my case I do this based on mime types sent from the client:
_MIME_TYPES = {
# xml mime type needs lower priority, that's needed for WebKit based browsers,
# which add application/xml equally to text/html in accept header
'xml': ('application/xml;q=0.9', 'text/xml;q=0.9', 'application/x-xml;q=0.9',),
'html': ('text/html',),
'json': ('application/json',),
}
mime = self.request.accept.best_match(reduce(lambda x, y: x + y, _MIME_TYPES.values()))
if mime:
for shortmime, mimes in _MIME_TYPES.items():
if mime in mimes:
renderer = shortmime
break
# call specific render function
renderer = 'render' + renderer
logging.info('Using %s for serving response' % renderer)
try:
getattr(self.__class__, renderer)(self)
except AttributeError, e:
logging.error("Missing renderer %s" % renderer)

Categories

Resources