I have a flask application using the jinja2 template engine. The content is dynamic, pulling particular bits from a database.
I'd like to turn a particular page static with all the dynamic data etc. intact. However, I'd like this to run every hour or so as the database will continue to have new data, hench why I'm not just using an existing static generator or building the static page manually - I will cron the job to run automatically.
Any thoughts on how this might be done? I can't provide code examples as I simply don't have a clue on how I might tackle this.
Any help to get me started would be much appreciated.
You can use Frozen-Flask to convert a dynamic Flask app to a static site. It can discover most pages on it's own, assuming each page is linked from another page, such as a list of blog posts linking to individual posts. There are also ways to tell it about other pages if they are not discovered automatically. You could run this periodically with a cron job to update the static site regularly.
freeze_app.py:
from flask_frozen import Freezer
from myapp import app
freezer = Freezer(app)
freezer.freeze()
Related
I have a doubt about the optimal structure of my files.
Right now, I have all my urls in a file api.py.
The endpoints for a new version of the app are being located in a new api_v2.py file, only the urls that require a new version, is that ok?
Also, if a new functionality is added to the app, should I put the new urls directly on api_v2.py? or maybe on the base api.py and keep the api_v2.py for the urls that have a "new version" of themselves?
Any advice will help.
Okay so I know there are already some questions about this topic but I don't find any to be specific enough. I want to have a script on my site that will automatically generate my sitemap.xml (or save it somewhere). I know how to upload files and have my site set up on http://sean-behan.appspot.com with Python 2.7. How do I setup the script that will generate the sitemap and if possible please reference the code. Just ask if you need more info. :) Thanks.
You can have outside services automatically generate them for you by traversing your site.
One such service is at http://www.xml-sitemaps.com/details-sean-behan.appspot.com.html
Alternatively, you can serve your own xml file based on the URL's you want to appear in your site. In which case, see Tim Hoffman's answer.
I can't point you to code, as I don't know how your site is structured or what templating env you use, does your site structure include static pages etc...
The basics are if you have code that can pull together a list of dictionaries that contain the metadata about each page you want in your sitemap then you are half way there.
The use a templating language (or straight python ) that generates an xml file as per sitemap.org spec.
Now you have two choices, dynamically serve this output as requested, or store it in the datastore if when compressed it is less than 1MB, or write it to google cloud storage, then server it's contents when /sitemap.xml is requested. You will then set up a cron task to regenerate your cached sitemap once a day (or whatever frequency is appropriate).
I am trying to let users create html pages that they can view on my website. So the homepage would just have a place for file upload, then some script on my site would take the file and convert it into the html page and place it at mysite.com/23klj4d(identifying file name). From my understanding, this would mean that the urls.py file gets updated to route that url to display the html page of the file. Is it possible to let users do this? Where would I put that conversion script?
You could simply write static files to disc, and then serve them as static files. That would be easier, but it depends on your other requirements.
But, from what I understand in your question you'd need:
A form to upload
A upload handler itself which inserts into the db
A view that renders based on a path
Not sure about the urls.py entry. You'll want something in there to separate this content from the rest of your site, and you'll probably also want something to safeguard against the file extensions that you serve there.
note: this has security hole written all over it. I would be super careful in how you test this.
I'm trying to do three things.
One: crawl and archive, at least daily, a predefined set of sites.
Two: run overnight batch python scripts on this data (text classification).
Three: expose a Django based front end to users to let them search the crawled data.
I've been playing with Apache Nutch/Lucene but getting it to play nice with Django just seems too difficult when I could just use another crawler engine.
Question 950790 suggests I could just write the crawler in Django itself, but I'm not sure how to go about this.
Basically - any pointers to writing a crawler in Django or an existing python crawler that I could adapt? Or should I incorporate 'turning into Django-friendly stuff' in step two and write some glue code? Or, finally, should I abandon Django altogether? I really need something that can search quickly from the front end, though.
If you insert your django project's app directories into sys.path, you can write standard Python scripts that utilize the Django ORM functionality. We have an /admin/ directory that contains scripts to perform various tasks-- at the top of each script is a block that looks like:
sys.path.insert(0,os.path.abspath('../my_django_project'))
sys.path.insert(0,os.path.abspath('../'))
sys.path.insert(0,os.path.abspath('../../'))
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
Then it's just a matter of using your tool of choice to crawl the web and using the Django database API to store the data.
You write your own crawler using urllib2 to get the pages and Beautiful Soup to parse the HTML looking for the content.
Here's an example of reading a page:
http://docs.python.org/library/urllib2.html#examples
Here's an example of parsing the page:
http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing HTML
If you don't want to write crawler using Django ORM (or already have working crawler) you could share database between crawler and Django-powred front-end.
To be able to search (and edit) existing database using Django admin you should create Django models.
The easy way for that is described here:
http://docs.djangoproject.com/en/dev/howto/legacy-databases/
So ive just started learning python on WAMP, ive got the results of a html form using cgi, and successfully performed a database search with mysqldb. I can return the results to a page that ends with .py by using print statements in the python cgi code, but i want to create a webpage that's .html and have that returned to the user, and/or keep them on the same webaddress when the database search results return.
thanks
paul
edit: to clarify on my local machine, i see /localhost/search.html in the address bar i submit the html form, and receive a results page at /localhost/cgi-bin/searchresults.py. i want to see the results on /localhost/results.html or /localhost/search.html. if this was on a public server im ASSUMING it would return .../cgi-bin/searchresults.py, the last time i saw /cgi-bin/ directories was in the 90s in a url. ive glanced at addhandler, as david suggested, im not sure if thats what i want.
edit: thanks all of you for your input, yep without using frameworks, mod_rewrite seems the way to go, but having looked at that, I decided to save myself the trouble and go with django with mod_wsgi, mainly because of the size of its userbase and amount of docs. i might switch to a lighter/more customisable framework, once ive got the basics
First, I'd suggest that you remember that URLs are URLs and that file extensions don't matter, and that you should just leave it.
If that isn't enough, then remember that URLs are URLs and that file extensions don't matter — and configure Apache to use a different rule to determine that is a CGI program rather than a static file to be served up as is. You can use AddHandler to add a handler for files on the hard disk with a .html extension.
Alternatively, you could use mod_rewrite to tell Apache that …/foo.html means …/foo.py
Finally, I'd suggest that if you do muck around with what URLs look like, that you remove any sign of something that looks like a file extension (so that …/foo is requested rather then …/foo.anything).
As for keeping the user on the same address for results as for the request … that is just a matter of having the program output the basic page without results if it doesn't get the query string parameters that indicate a search term had been passed.