Exporting data as an XML file in google appengine

Exporting data as an XML file in google appengine - python

I'm trying to export data to an XML file in the Google appengine, I'm using Python/Django. The file is expected to contain upto 100K records converted to XML. Is there an equivalent in App Engine of:
f = file('blah', 'w+')
f.write('whatever')
f.close()
?
Thanks
Edit
What I'm trying to achieve is exporting some information to an XML document so it can be exported to google places (don't know exactly how this will work, but I've been told that google will fecth this xml file from time to time).

You could also generate XML with Django templates. There's no special reason that a template has to contain HMTL. I use this approach for generating the Atom feed for my blog. The template looks like this. I pass it the collection of posts that go into the feed, and each Post entity has a to_atom method that generate its Atom representation.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>

Every datastore model class has an instance method to_xml() that will generate an XML representation of that datastore type.
Run your query to get the records you want
Set the content type of the response as appropriate - if you want to prompt the user to save the file locally, add a content-disposition header as well
generate whatever XML preamble you need to come before your record data
iterate through the query results, calling to_xml() on each and adding that output to your reponse
do whatever closing of the XML preamble you need to do.

What the author is talking about is probably Sitemaps.
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
And about what I think you need is to write the XML to request object like so:
doc.writexml(self.response.out)
In my case I do this based on mime types sent from the client:
_MIME_TYPES = {
# xml mime type needs lower priority, that's needed for WebKit based browsers,
# which add application/xml equally to text/html in accept header
'xml': ('application/xml;q=0.9', 'text/xml;q=0.9', 'application/x-xml;q=0.9',),
'html': ('text/html',),
'json': ('application/json',),
}
mime = self.request.accept.best_match(reduce(lambda x, y: x + y, _MIME_TYPES.values()))
if mime:
for shortmime, mimes in _MIME_TYPES.items():
if mime in mimes:
renderer = shortmime
break
# call specific render function
renderer = 'render' + renderer
logging.info('Using %s for serving response' % renderer)
try:
getattr(self.__class__, renderer)(self)
except AttributeError, e:
logging.error("Missing renderer %s" % renderer)

Related

replace some html content with values

I want to send a verification email. The email consists of HTML. I saved the HTML into a file named email_templates/verify.html (path). The problem is, that there are some constants in the HTML file are unknown until runtime. For instance, in the email, I refer to the username to which I send my email, but since each email is referring to someone else, I can't include the name in the template. One solution that comes to mind is to use some formatting technique in the lines of
<div>
hello {usrname}!
<div>
and then in the python code do something like:
lines = open('email_templates/verify.html', 'r').read()
lines.format('joe')
But this code, although is, in fact, can work, has some issues:
every {} in the HTML file can be a mistake to be formatted
the code in the current form is not very readable
code is not elegant
for an HTML reader that don't know python the formatting placeholders will be confusing
Is there is any better way to approach this?

This can and should be done through templating.
As you mentioned that maybe python placeholders will be confusing but I tell you they are not confusing, templating engines make sure HTML looks like HTML and these template tags look like template tags. Templating engines lay down the rules which placeholders you can and can't use. Also they are way fast than the file opening method you suggested; because they are optimized to do so.
Let's understand by example:
There are several templating engines out there. Jinja2 is one of the best ones.
First, install Jinja2.
pip install jinja2
Second, create a python file(name it anything you want) and a folder named 'templates'. Under 'templates' folder create your verify.html
Your folder structure should look like this:
folder1
|
|--> pythonfile.py
|--> templates
|
|--> verify.html
Third, put some sample code in the HTML file. I have this example put in my verify.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Index</title>
</head>
<body>
<h1>Dear {{ user }}!</h1>
<h4>
Hope you are fine.
</h4>
<p>
Thank you for signing up. Here is your {{ coupon_code }}
</p>
</body>
</html>
Now in this html file you see I have normal html tags. But there are two sets of curly braces occurring twice. The word written inside the curly braces will be considered a variable by jinja. The value of this variable will be supplied by our python file to this html file.
Also, to be consistent, jinja doesn't allow you to just use any braces. I mean if I had put "<>" instead of "{{ }}" it would not have worked. So there are some rules to be followed.
Read more here: Jinja allowed tags and filters
Fourth, copy this code into the python file we created.
#Imports
from jinja2 import Environment, FileSystemLoader, Template
#name of the folder where index file is located.
file_loader = FileSystemLoader('templates')
#This object is needed to create a template object.
env = Environment(loader=file_loader)
#path of the HTML file reletive to the folder.
template = env.get_template('./index.html')
#Data dictionary to be supplied to our HTML file.
input_dict = {
'user': 'Harry',
'coupon_code': '12313ASDSA4'}
#This function renders the data substituted HTML form.
output = template.render(input_dict)
print(output)
Now run this python file.

XML uses an external DTD for validation - XML parser is Python (lxml) and this parser cannot load the external DTD from the HTTPS side

I have another problem I'm desperate about.
I think there are many solutions to this problem, but I would like to know if my approach can be implemented somehow.
I have a XML file uses one external DTD and is defined with the XML DOCTYP.
The xml-file are parsed with Python (lxml). So it is possible to validate the different files automatically with the DTD's defined in the XML DOCTYP. I use an external DTD which can be accessed via internet address. But this internet site redirects every request to the HTTPS port. For this reason Python cannot access the external DTD.
Thanks to a friend of mine I was able to use an old, unused website that still runs on HTTP. The DTD on this stored website can be found and used by the parser.
Now for my question. Is it possible to use an external DTD with Python-lxml that is only accessible via a HTTPS server? Unfortunately I have no possibility to create an area on the server that uses the HTTP port.
I've already tried to get the external DTD via an HTTP request but it gets redirected to the HTTPS port.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE book PUBLIC "-//AA//Test//EN" "***">
<!-- <!DOCTYPE book PUBLIC "-//AA//Test//EN" "***"> -->
<book>
<book-meta>
<book-id pub-id-type="other">handbook</book-id>
<book-title-group Id="1">
<book-title name="Hallo">The NCBI Handbook</book-title>
</book-title-group>
</book-meta>
</book>
For completeness here is an example DTD.
<!ELEMENT book ANY>
<!ATTLIST book
Release CDATA "v0.0.1"
>
<!ELEMENT book-meta ANY> <!-- # related objects: 0 -->
<!ATTLIST book-meta
Value CDATA "Das ist eine Information"
>
<!ELEMENT book-id ANY> <!-- # related objects: 0 -->
<!ATTLIST book-id
pub-id-type CDATA #REQUIRED
>
<!ELEMENT book-title-group ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title-group
Id CDATA #IMPLIED
>
<!ELEMENT book-title ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title
name CDATA #REQUIRED
>
For parsing the XML files I use a python script with the library lxml.
Following is the test program.
import xml.etree.ElementTree as ET
import lxml
from lxml import etree
myParser = lxml.etree.XMLParser(attribute_defaults = True, dtd_validation = True, load_dtd =True, no_network = False)
xmlFile = lxml.etree.parse("XML_DTDValidation.xml", parser=myParser)
xmlFile.xinclude()
xmlFile.write("XML_DTDValidation_out.xml",method="xml",xml_declaration=True, encoding='utf-8',pretty_print=True)
I hope I could summarize my problem well and someone can help me.

This page describes some ways to work around this.
You can either:
set up an XML catalog (which you could use to store the DTD somewhere local)
create your own resolver class which either redirects the URL, or retrieves the DTD from somewhere else.

Template render on HTML AND CSS using Google App Engine

I'm currently working on my website hosted on GAE.
It has not been update since a while, so now, I'm trying to made a refresh of it :D
To do the trick, I try to use the MVC model using Python and WSGI, WebAPP2 and Render.Template.
Everything goes right, except for the CSS part.
Indeed, I can't render some part of my CSS using the GAE (django) method.
My Python controller is calling the HTML file and replace the variables by the dict() values correctly.
But now, and to be able to only have restricted amount of CSS file, I'm trying to do the same thing.
Unfortunatly I don't know how I'm suppose to call the CSS File.
I'm currently calling my CSS on my HTML as usual:
<link rel="stylesheet" media="screen" type="text/css" href="/assets/css/struct/index.css">
And trying to dynamically render this part of the file:
header#navigation{
height:auto;
min-height:480px;
width:100%;
min-width:100%;
background-image:url('/assets/img/content/{{content_cat_name}}/cat_img.jpg');
background-repeat:no-repeat;
background-position: left top;
background-size:contain;
background-color:#efefef;
}
and everything is then call by my python code like this:
class URIHandler(webapp2.RequestHandler):
def get(self, subdomain, page):
name = subdomain
pattern = os.path.join(os.path.dirname(__file__), '../views' ,'index.html')
template_values = {
'content_cat_name':name,
'cat_menu_title':name,
'cat_menu_text':name,
}
self.response.out.write(template.render(pattern, template_values))
So, if someone could help me to correctly call my CSS and replace the variables using my python script, I'll be really happy :D
Thanks in advance.

template.render can only replace tokens in the file that you specify in the path parameter (the first parameter). You're serving the .css file out of a static directory, so no token replacement happens, because that file's not getting passed through that code.
You could inline the parts of your CSS that contain tokens in your index.html file.

I was having what I think is the same problem. I found this GAE documentation very helpful. In short you need to go into your app.yaml file create a new handler:
url: /foldername
static_dir: foldername
And then in your link tag:
href="foldername/index.css"

Generating XML/Feed for your Python Blog

I've been trying to add RSS feeds in my blog(webapp2 application - Jinja2 templates), this is what I have:
class XMLHandler(Handler):
def get(self):
posts = db.GqlQuery("SELECT * FROM Post WHERE is_draft = FALSE ORDER BY created DESC")
self.render("xmltemplate.xml", posts=posts, mimetype='application/xml')
xmltemplate.xml looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<channel>
<title>Blag</title>
<link>http://www.blagonudacity.appspot.com/</link>
<description>Yet Another Blag</description>
{%for post in posts %}
<entry>
<title>{{post.subject}}></title>
<link href="http://www.blagonudacity.appspot.com/post/{{ post.key().id()}}" rel="alternate" />
<updated>{{post.created}}</updated>
<author><name>Prakhar Srivastav</name></author>
<summary type="html"> {{ post.content }} </summary>
</entry>
{%endfor%}
</channel>
</feed>
What i'm getting in my browser when I migrate to the relevant page /feeds/all.atom.xml
is just a html page with the markup. It doesn't look like how XML pages look in browser. What am I doing wrong here? Here is the demo

I saw that the page is delivered with content type text/html, this could be one problem, i suggest you should set this to text/xml (more details can be found here.
Also it highly depends on the browser on how this is displayed, i guess you are using chrome (like me) where the link provided by you looks like a webpage, if you open it in firefox you will see the "live bookmark" styled page, however the entries don't show. I'm not sure if this is because of some problem with your markup or some problem with firefox and atom feeds.
The xml file itself seems to be ok (checked with w3 validator).
UPDATE: Ok, there seems to be something wrong with your atom XML (it is valid xml, as mentioned above) however it does not seem to be valid Atom data (according to the feed validator).
I tried to bookmark it in firefox and it does not show any entries (just like the preview mentioned above).
So i think you should take a look at the atom feed e.g. this and this could help.
I'm not really sure but when looking at your XML i think that you may have mixed up Atom and Rss a little.

Generating RSS feed under Google App Engine

I want to provide rss feed under google app engine/python.
I've tried to use usual request handler and generate xml response. When I access the feed url directly, I can see the feed correctly, however, when I'm trying to subscribe to the feed in google reader, it says that
'The feed being requested cannot be found.'
I wonder whether this approach is right. I was considering using a static xml file and updating it by cron jobs. But while GAE doesn't support file i/o, this approach seems not going to work.
How to solve this? Thanks!

There're 2 solutions I suggest:
GAE-REST you can just add to your project and configure and it will make RSS for you but the project is old and no longer maintained.
Do like I do, use a template to write a list to and like this I could succeed generating RSS (GeoRSS) that can be read via google reader where template is:
<title>{{host}}</title>
<link href="http://{{host}}" rel="self"/>
<id>http://{{host}}/</id>
<updated>2011-09-17T08:14:49.875423Z</updated>
<generator uri="http://{{host}}/">{{host}}</generator>
{% for entity in entities %}
<entry>
<title><![CDATA[{{entity.title}}]]></title>
<link href="http://{{host}}/vi/{{entity.key.id}}"/>
<id>http://{{host}}/vi/{{entity.key.id}}</id>
<updated>{{entity.modified.isoformat}}Z</updated>
<author><name>{{entity.title|escape}}</name></author>
<georss:point>{{entity.geopt.lon|floatformat:2}},{{entity.geopt.lat|floatformat:2}}</georss:point>
<published>{{entity.added}}</published>
<summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">{{entity.text|escape}}</div>
</summary>
</entry>
{% endfor %}
</feed>
My handler is (you can also do this with python 2.7 as just a function outside a handler for a more minimal solution):
class GeoRSS(webapp2.RequestHandler):
def get(self):
start = datetime.datetime.now() - timedelta(days=60)
count = (int(self.request.get('count'
)) if not self.request.get('count') == '' else 1000)
try:
entities = memcache.get('entities')
except KeyError:
entity = Entity.all().filter('modified >',
start).filter('published =',
True).order('-modified').fetch(count)
memcache.set('entities', entities)
template_values = {'entities': entities, 'request': self.request,
'host': os.environ.get('HTTP_HOST',
os.environ['SERVER_NAME'])}
dispatch = 'templates/georss.html'
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Cache-Control'] = 'public,max-age=%s' \
% 86400
self.response.headers['Content-Type'] = 'application/rss+xml'
self.response.out.write(output)
I hope some of this works for you, both ways worked for me.

I have an Atom feed generator for my blog, which runs on AppEngine/Python. I use the Django 1.2 template engine to construct the feed. My template looks like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xml:lang="en"
xml:base="http://www.example.org">
<id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
<title>Adam Crossland's Blog</title>
<subtitle>opinions and rants on software and...things</subtitle>
<updated>{{ updated }}</updated>
<author>
<name>Adam Crossland</name>
<email>adam#adamcrossland.net</email>
</author>
<link href="http://blog.adamcrossland.net/" />
<link rel="self" href="http://blog.adamcrossland.net/home/feed" />
{% for each_post in posts %}{{ each_post.to_atom|safe }}
{% endfor %}
</feed>
Note: if you use any of this, you'll need to create your own uuid to go into the id node.
The updated node should contain the time and date on which contents of the feed were last updated in rfc 3339 format. Fortunately, Python has a library to take care of this for you. An excerpt from the controller that generates the feed:
from rfc3339 import rfc3339
posts = Post.get_all_posts()
self.context['posts'] = posts
# Initially, we'll assume that there are no posts in the blog and provide
# an empty date.
self.context['updated'] = ""
if posts is not None and len(posts) > 0:
# But there are posts, so we will pick the most recent one to get a good
# value for updated.
self.context['updated'] = rfc3339(posts[0].updated(), utc=True)
response.content_type = "application/atom+xml"
Don't worry about the self.context['updated'] stuff. That just how my framework provides a shortcut for setting template variables. The import part is that I encode the date that I want to use with the rfc3339 function. Also, I set the content_type property of the Response object to be application/atom+xml.
The only other missing piece is that the template uses a method called to_atom to turn the Post object into Atom-formatted data:
def to_atom(self):
"Create an ATOM entry block to represent this Post."
from rfc3339 import rfc3339
url_for = self.url_for()
atom_out = "<entry>\n\t<title>%s</title>\n\t<link href=\"http://blog.adamcrossland.net/%s\" />\n\t<id>%s</id>\n\t<summary>%s</summary>\n\t<updated>%s</updated>\n </entry>" % (self.title, url_for, self.slug_text, self.summary_for(), rfc3339(self.updated(), utc=True))
return atom_out
That's all that is required as far as I know, and this code does generate a perfectly-nice and working feed for my blog. Now, if you really want to do RSS instead of Atom, you'll need to change the format of the feed template, the Post template and the content_type, but I think that is the essence of what you need to do to get a feed generated from an AppEngine/Python application.

There's nothing special about generating XML as opposed to HTML - provided you set the content type correctly. Pass your feed to the validator at http://validator.w3.org/feed/ and it will tell you what's wrong with it.
If that doesn't help, you'll need to show us your source - we can't debug your code for you if you won't show it to us.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exporting data as an XML file in google appengine - python

Related

replace some html content with values

XML uses an external DTD for validation - XML parser is Python (lxml) and this parser cannot load the external DTD from the HTTPS side

Template render on HTML AND CSS using Google App Engine

Generating XML/Feed for your Python Blog

Generating RSS feed under Google App Engine

Categories

Resources