Filling out paragraph text via urllib? - python

Say I have a paragraph in a site I am managing and I want a python program to change the contents of that paragraph. Is this plausible with urllib?

Quite possibly; it depends on how the site is designed.
If the site is just a collection of static pages (ie .html files) then you would have to get a copy of the page, modify it, and upload the new version - most likely using sftp or WebDAV.
If the site is running a content management system (like Wordpress, Drupal, Joomla, etc) then it gets quite a bit simpler - your script can simply post new page content.
If it is static content maintained through a template system (ie Dreamweaver) then life gets quite a bit nastier again - because any changes you make will not be reflected in the template files, and will likely get overwritten and disappear the next time you update the site.

If you have access to any server-side scripting language, its easy.

Related

web2py site doesn't load all images/videos (especially larger ones)

First of all, I must say I have seen something similar to this in the web2py discussion group, but I couldn't understand it very well.
I've set up a database-driven website using web2py in which the entries are just HTML text. Most of them will contain img and/or video tags that point to relative URLs; these files are stored in folders with the address pattern static/content/article/<article-name> and the document's base href is set via the controller to make these links work. So, the images are stored and referenced directly, without all the upload/download machinery.
I'm testing it locally and using Rocket server because I'm not allowed to install Apache in this PC.
The problem:
Everything works fine, except, as it seems, when there are several "large" files being requested. By "large" I mean 4Mb files were enough, which isn't really a lot (and I think slightly smaller files would produce the same result). I'm pretty sure the links aren't broken since 1) by copying/pasting their URLs in the browser they show up normally, 2) the images/videos appear well/broken randomly when I refresh the page and 3) sometimes a video loads until a certain point and then stops, and the browser inspector shows a 'fail' signal. When I replaced these files with smaller ones (each with a dozen kb), all of them loaded. Another thing to consider is that sometimes it takes a really long time until the page finishes loading (from 2 seconds to several minutes).
The questions:
Is this the simplest/optimal way of getting the job done? I'm aware that web2py has some neat features like upload fields, but I don't know how I could make these files be effortlessly referenced in the document, considering there will be some special features in such pages involving the static files. So the solution I've come up with so far was to create a directory which name equals to the entry's and store the files there, as I said before. Is it an overkill considering what web2py has to offer?
If the answer to the first question is something like "yes", then (obvious question) what may be causing the problem and how do I fix it? Does it have something to do with the fact that web2py sends static files in chunks of 1Mb? Might it be the Rocket server? Or because I'm testing it locally?
Thanks in advance!
It's hard to give you an answer without knowing some details...
Where is hosted your Web2py application?
Do you use apache? nginx?
Did you deploy using a one step-deploy script? (http://web2py.googlecode.com/hg/scripts/setup-web2py-ubuntu.sh)
But in any case, you can (should) :
Configure Apache/Nginx to serve your static files directly (files in /YourApp/static/.). See "setup-web2py-" scripts in the "scripts" folder for more informations
Use scripts/zip_static_files.py to create gzipped versions of your static files. You can create a cron to run "python web2py.py -S myapp -R scripts/zip_static_files.py"
More details about efficiency in the book : http://web2py.com/books/default/chapter/29/13/deployment-recipes?search=static+files#Efficiency-and-scalability

Automatically generated sitemap on Google App Engine

Okay so I know there are already some questions about this topic but I don't find any to be specific enough. I want to have a script on my site that will automatically generate my sitemap.xml (or save it somewhere). I know how to upload files and have my site set up on http://sean-behan.appspot.com with Python 2.7. How do I setup the script that will generate the sitemap and if possible please reference the code. Just ask if you need more info. :) Thanks.
You can have outside services automatically generate them for you by traversing your site.
One such service is at http://www.xml-sitemaps.com/details-sean-behan.appspot.com.html
Alternatively, you can serve your own xml file based on the URL's you want to appear in your site. In which case, see Tim Hoffman's answer.
I can't point you to code, as I don't know how your site is structured or what templating env you use, does your site structure include static pages etc...
The basics are if you have code that can pull together a list of dictionaries that contain the metadata about each page you want in your sitemap then you are half way there.
The use a templating language (or straight python ) that generates an xml file as per sitemap.org spec.
Now you have two choices, dynamically serve this output as requested, or store it in the datastore if when compressed it is less than 1MB, or write it to google cloud storage, then server it's contents when /sitemap.xml is requested. You will then set up a cron task to regenerate your cached sitemap once a day (or whatever frequency is appropriate).

Search index for flat HTML pages

I'm looking to add search capability into an existing entirely static website. Likely, the new search functionality itself would need to be dynamic, as the search index would need to be updated periodically (as people make changes to the static content), and the search results will need to be dynamically produced when a user interacts with it. I'd hope to add this functionality using Python, as that's my preferred language, though am open to ideas.
The Google Web Search API won't work in this case because the content being indexed is on a private network. Django haystack won't work for this case, as that requires that the content be stored in Django models. A tool called mnoGoSearch might be an option, as I think it can spider a website like Google does, but I'm not sure how active that project is anymore; the project site seems a bit dated.
I'm curious about using tools like Solr, ElasticSearch, or Whoosh, though I believe that those tools are only the indexing engine and don't handle the parsing of search content. Does anyone have any recommendations as to how one may index static html content for retrieving as a set of search results? Thanks for reading and for any feedback you have.
With Solr, you would write code that retrieves content to be indexed, parses out the target portions from the each item then sends it to Solr for indexing.
You would then interact with Solr for search, and have it return either the entire indexed document an ID or some other identifying information about the original indexed content, using that to display results to the user.

Getting information about static files in Python App Engine; workarounds

I'm working on an App Engine project that will have customizable themes. I'd like to be able to use jQuery UI themes. The problem is figuring out what the CSS file is going to be named. (Typically, "jquery-ui-1.7.2.custom.css". Version numbers will change, and people tend to rename things, but there should only be one CSS file, and I'm OK with it being an error condition if there's two or more for some reason.) Because it's a static file (static files are uploaded to App Engine separately from the rest of the application's resources), I can't just glob the directory for a CSS file. I can't just assume that it's hard-coded, and I really don't want to make it a configuration setting, because that's a bad user experience.
Guido told me to symlink it so that App Engine sees two copies and can treat one as static and the other as an application resource, but symlinks don't work on Windows, and since this will ultimately be open source, I can't control which SDK the user uses. Another suggestion was to use a deploy-time script, but Mac users have this nice "Deploy" button in their version of the SDK and I'd rather not have to tell them, "Oh hey, sorry for the inconvenience, but you can't use that for this project."
I clearly need an out-of-the-box solution to this one, but I'm at a loss. Anyone have any good suggestions for how to get a custom jQuery UI theme out of the ThemeRoller and into an App Engine app? Some post-processing is already needed, because the only files in the zip file that ThemeRoller gives you are in the "css" directory. Maybe I can write something that takes a raw theme as input and spits out something useful on the other side (the deploy-time script trick, but somehow less user-unfriendly). The trick here is presentation — I want the user to spend as little time on the command line as possible. An ideal solution assumes the person performing this task is non-technical for the most part. No part of the solution can be much harder than installing something like WordPress or Drupal, and in a perfect world, it should be way, way easier.
To accomplish what you are asking, I would use the datastore for serving the CSS files. Since this would allow easy listing, sorting and even modification and uploading.
Other than that, your next best options would be to store the CSS data inside a script (a dictionary where the filename is the key name, and the CSS code is the value). Or, as you suggested, to run a script before deploying to AppEngine.
Personally, I would go for the storing in the datastore option, since it will allow for a great deal more user customization (such as each user being able to provide their own CSS file), just be sure to use memcache to avoid needing to access the datastore when possible (which should be a very common occurrence), as well as using HTTP headers to tell the browser to cache the CSS file locally.

How do i output a dynamically generated web page to a .html page instead of .py cgi page?

So ive just started learning python on WAMP, ive got the results of a html form using cgi, and successfully performed a database search with mysqldb. I can return the results to a page that ends with .py by using print statements in the python cgi code, but i want to create a webpage that's .html and have that returned to the user, and/or keep them on the same webaddress when the database search results return.
thanks
paul
edit: to clarify on my local machine, i see /localhost/search.html in the address bar i submit the html form, and receive a results page at /localhost/cgi-bin/searchresults.py. i want to see the results on /localhost/results.html or /localhost/search.html. if this was on a public server im ASSUMING it would return .../cgi-bin/searchresults.py, the last time i saw /cgi-bin/ directories was in the 90s in a url. ive glanced at addhandler, as david suggested, im not sure if thats what i want.
edit: thanks all of you for your input, yep without using frameworks, mod_rewrite seems the way to go, but having looked at that, I decided to save myself the trouble and go with django with mod_wsgi, mainly because of the size of its userbase and amount of docs. i might switch to a lighter/more customisable framework, once ive got the basics
First, I'd suggest that you remember that URLs are URLs and that file extensions don't matter, and that you should just leave it.
If that isn't enough, then remember that URLs are URLs and that file extensions don't matter — and configure Apache to use a different rule to determine that is a CGI program rather than a static file to be served up as is. You can use AddHandler to add a handler for files on the hard disk with a .html extension.
Alternatively, you could use mod_rewrite to tell Apache that …/foo.html means …/foo.py
Finally, I'd suggest that if you do muck around with what URLs look like, that you remove any sign of something that looks like a file extension (so that …/foo is requested rather then …/foo.anything).
As for keeping the user on the same address for results as for the request … that is just a matter of having the program output the basic page without results if it doesn't get the query string parameters that indicate a search term had been passed.

Categories

Resources