Jython, Jepp or Pylons for the performance - python

I'm trying to incorporate server-based code diff and highlighting in my GWT (Java) project. I managed to incorporate Pygments and difflib into my code using Jython. The basic idea is to generate complete markup on the server and then simply inject code into the page as innerHTML.
I found Jython completely inadequate as even for relatively small files (2K-3K lines) it takes Pygments or difflib forever (minutes not seconds) to process these files. Difflib actually reliably causes OOM errors in the process with dedicated 500M of memory
So I'm wondering if my current setup is wrong or Jython is simply unsuitable for this purpose?
If so, what's next? I discover Jepp but then I would have to build my project for each platform and it has little documentation and don't seem very stable. Another possibility would be to run Pylons as a separate webservice on the same host and get the markup directly to client or channel it through server. And yet another way is to use Java System to execute python script as a process and capture the output.
I would be very interested to hear solid suggestion on the matter.

Having a separate service sounds like the best way to go. For Pygments, there is already a service available (on Google App Engine). The source for the app is BSD open source and on GitHub here. You could adapt this to add difflib functionality too, of course.

I'm going to accept answer above since it coincides with my findings but just to let anyone who reads this know - running separate webservice for Pygments using Python-native solution such as Bottle performs many times better than embedded Jython. Especially on Linux

Related

How to build a web service with one sandboxed Python (VM) per request

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?
Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started
[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".
I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D
I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.

Extensible Local HTTP Server with Framework in Python

I'm trying to build a desktop application using Python. To make it able to be used on as many platforms as possible, I think web UI may be a good choice. This boils down to the problem of making a local HTTP server first. I did some survey and found that people are mainly talking about BaseHTTPServer and SimpleHTTPServer. For prototyping, subclassing them may suffice.
Besides pure prototyping, I also want to leave some room for extension to real service. That is, once mature, I'd like to move the codes to a real dedicated HTTP server, so that end users only need a browser to use it.
I say "extensible" in the following sense:
The code modification is as minimum as possible in the migration process.
I will focus on algorithm in the prototyping stage. I also want to leave some room for future front end designer.
It looks WSGI + Django is a widely mentioned combination. After some search, what I found is using WSGI in apache or nginx. Is it possible to use self-contained modules? i.e. wsgiref + Django, so that I can start everything just from one entry script. I don't want to bother potential first adopters by asking them install apache and configure it. It will be very good if you have sample codes or pointers for further reading.
I'm new to Python and web programming in Python. Thanks for your help. I just try to make sure I'm on the right track. My underlying algorithms is implemented in Python 2.7. So the UI solution had better also be in Python 2.7.
I think what you may want is Bottle. It is a web framework that only needs the standard library to be installed. It also has compatibility with many other production servers, as well as shipping with it's own development server. And if that isn't good enough, it is all in a single file, and has support with many different templating languages, as well as it's own built in templating language.
Check it out here: http://bottlepy.org/docs/dev/
As mentioned bottle is a good choice, I personally like Flask, which if I recall correctly is what bottle is based off of. Anyways there are three things that really make Flask a joy to use.
Blueprints - essentially an application architecture
Flask-Sijax - allows for comet technology
Celery - an asynchronous task queue/job queue based on distributed message passing
there are a lot of other plugins, including one for an admin interface that I haven't tried out yet but it looks promising, and it works with Python 2.7

Web gateway interfaces in Python 3

I've finally concluded that I can no longer afford to just hope the ongoing Py3k/WSGI disasterissues will be resolved anytime soon, so I need to get ready to move on.
Unfortunately, my available options don't seem a whole lot better:
While I find a few different Python modules for FastCGI scattered around the web, none of them seem to be getting much (if any) attention and/or maintenance, particularly with regard to Python 3.x, and it's difficult to distinguish which, if any, are really viable.
Falling all the way back to the built-in CGI module is hardly better than building something myself from scratch (worse, there's an important bug or two in there that may not get attention until Python 3.3).
There is no higher sin than handling HTTP directly in a production webapp. And anyway, that's still reinventing the wheel.
Surely somebody out there is deploying webapps on 3.x in production. What gateway interface are you using, with which module/libraries, and why?
CherryPy 3.2 release candidates support Python 3.X. Because it only supports WSGI at the web server interface layer and not through the whole stack, then you are isolated from issues as to whether WSGI will change. CherryPy has its own internal WSGI server, but also can run under Apache/mod_wsgi with Python 3.1+. See:
http://www.cherrypy.org/wiki/WhatsNewIn32
http://code.google.com/p/modwsgi/wiki/SupportForPython3X
bottle supports Python 3, but it suffers from the broken stdlib. However, multipart reimplements cgi.FieldStorage and can be used with bottle to build a Python 3 WSGI web app. I just published a demo. For the moment it is just a test, but as far as I can tell it works well.

What modules ought I to consider in Python if I wish to use CGI sessions?

Given that I know no web frameworks in Python and would like to keep it Very Simple at the moment (as I am Very Stupid), for what is a prototype of sketchy longevity, are there any streamlined, simple, "batteries-included" modules for this? (It is also too early in my Python career to evaluate frameworks, select one, and learn it.) I see a module named "Cookie," which could serve as a foundation, but nothing session-specific.
I'm familiar with the basic session concepts, having used them in classic ASP and gotten into the nuts-and-bolts of them in Perl, but I am not seeing a lot for Python. Beaker looks interesting, but then the documentation seems to require middleware with WSGI and I'm back to the frameworks problem.
I've found an old recipe on ActiveState for sessions, which could obviously use some buffing up. The information being held is not anything anyone would mind having been grabbed, so while I am normally quite security conscious, I would be willing to be a little bit more lax with this prototype.
Or is this a "roll-your-own" problem?
I will be using Python 2.6 on IIS 7.0.
I think the web2py (web framework) is easy enough for you. I think it is the simplest approach of making a website or webservice. It will be also easier, than to understand Cookie or the other modules of python related to web-things.
You can start a session, by just typing:
session.your_session_name = "blabla" # or whatever you want to store
To make a cookie, just look here.
In web2py you don't have to configure anything. Just download it and start web2py.py. (you must have python 2.6 < installed.) You can also find some examples and a web-slide.
The Python Cookie module does nothing more than to hold some values in a dictonary-like object, but I think you have to store it yourself on your harddisk.
CherryPy is worth looking into. Yes it is a framework, and yes it requires WSGI, but it is extremely lightweight compared to other more robust alternatives.
There is another question that was answered on SO that gives a brief example on how to manage sessions with CherryPy. As you can see it makes it very easy to get up and running quickly.
Lastly, here is a little document about setting up IIS for use with CherryPy.
WSGI is not a framework, nor does it require that you choose one -- it's THE standard way to run any Python web app framework on any Python-supporting web server, including a CGI one. If you have a WSGI application named app, and want to run it on CGI, see the docs and use wsgiref.handlers.CGIHandler().run(app), as the docs say.
So, you can perfectly well use Beaker via WSGI (on top of CGI) -- e.g., take the example in Beaker's docs and just add (the needed imports and) the run call above (using the wsgi_app object that example constructs, plus of course a session.save and as well needed as, again, the Beaker docs explain right afterwards).
Rich or heavy frameworks have their place but so do lightweight, flexible components like Beaker -- and WSGI middleware is a great way to leverage such components without requiring any "framework-y" arrangements, just good old WSGI (on top of CGI or anything else).
BTW, the best way to run WSGI on IIS might be isapi-wsgi (I can only say "might" because I have no IIS installation on which to test it;-). But as long as you code to WSGI (with any framework or with none at all), that will only be an optimization -- your application won't change (net of what handler's run or equivalent method you need to call;-) whether it's running on CGI, IIS via ISAPI, Google App Engine, or any other server-and-interface-thereto combination

Tornado and Python 3.x

I really like Tornado and I would like to use it with Python 3, though it is written for Python versions 2.5 and 2.6.
Unfortunately it seems like the project's source doesn't come with a test suite. If I understand correctly the WSGI part of it wouldn't be that easy to port as it's spec is not ready for Python 3 yet (?), but I am rather interested in Tornado's async features so WSGI compatibility is not my main concern even if it would be nice.
Basically I would like to know what to look into/pay attention for when trying to port or whether there are already ports/forks already (I could not find any using google or browsing github, though I might have missed something).
first of all, I want to apologize for an answer to an outdated topic,
but once I found this topic through Google, I want to update important information!
In the Tornado 2.0 adds support for Python 3.2!
https://github.com/facebook/tornado/blob/master/setup.py
http://groups.google.com/group/python-tornado/browse_thread/thread/69415c13d129578b
Software without a decent test suite is legacy software -- even if it has been released yesterday!-) -- so the first important step is to start building a test suite; I recommend Feathers' book in the URL, but you can start with this PDF which is an essay, also by Feathers, preceding the book and summarizing one of the book's main core ideas and practices.
Once you do have the start of a test suite, run it with Python 2.6 and a -3 flag to warn you of things 2to3 may stumble on; once those are fixed, it's time to try 2to3 and try the test suite with Python 3. You'll no doubt have to keep beefing up the test suite as you go, and I recommend regularly submitting all the improvements to the upstream Tornado open source project -- those tests will be useful to anybody who needs to maintain or port Tornado, after all, not just to people interested in Python 3, so, with luck, you might gain followers and more and more contributors to the test suite.
I can't believe that people are releasing major open source projects, in 2009!!!, without decent test suites, but I'm trusting you that this is indeed what the Tornadoers have done...
Tornado is a good web framework over something that kind of looks like twisted, but doesn't have twisted's bug fixes or features. I did a port to twisted a while back that essentially just removed code.
Some of these features are very important. For example, if you're doing WSGI, you're blocking a non-blocking web framework. Bad Things will happen. Twisted's async web framework also has a WSGI container, but it uses deferToThread to prevent it from blocking other requests. Still not the right way to scale an app, but it falls apart much more slowly.

Categories

Resources