I'm currently writing a web application using Django, Apache, and mod_wsgi that provides some FreeBSD server management and configuration features, including common firewall operations.
My Python/C library uses raw sockets to interact directly with the firewall and works perfectly fine when running as root, but raw socket operations are only allowed for root.
At this point, the only thing I can think of is to install and use sudo to explicitly allow the www user access to /sbin/ipfw which isn't ideal since I would prefer to use my raw socket library operations rather than a subprocess call.
I suppose another option would be to write (local domain sockets) or use an existing job system (Celery?) that runs as root and handles these requests.
Or perhaps there's some WSGI Daemon mode trickery I'm unaware of? I'm sure this issue has been encountered before. Any advice on the best way to handle this?
Use Celery or some other back end service which runs as root. Having a web application process run as root is a security problem waiting to happen. This is why mod_wsgi blocks you running daemon processes as root. Sure you could hack the code to disable the exclusion, but I am not about to tell you how to do that.
Related
I was wondering what is the advantages of mod_wsgi. For most python web framework, I can launch (daemon) the application by python directly and serve it in a port. Then when shall I use mod_wsgi?
Since I answered your other question regarding Flask, I assume you are referring to using Flask's development server. The problem with that is it is single threaded.
Using mod_wsgi, you would be running behind apache, which will do process forking and allow for multiple simultaneous requests to be handled.
There are other options as well. Depending on your particular use case I would consider using eventlet's wsgi server.
Ubuntu 11.10, Python 2.6. Background: I have an existing Python app that is using Twisted to sit in a loop and wait for RESTful commands to come in. So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever. When a request comes in, the appropriate handler is called, stuff happens, a reply is sent back.
My job is now to remove Twisted because management has decided they don't like it. We're moving to Apache as our web server.
Using the documentation, I have successfully installed and configured Apache2.0 to serve web pages. I also installed mod_wsgi, and was able to configure it and Apache to execute arbitrary Python code when a request comes in. So I'm good on that side.
What I'm missing is how to connect my Python application to the Apache/mod_wsgi bits, since the application needs to be persistent and always running. It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way. But it seems like this is something that should already be out there, I just don't know enough to know what to search for.
Any pushes in the right direction are greatly appreciated.
Further edit for clarity: I'm not making a webserver. The application in question is a host app that is running on a virtual machine. It happens to be controlled by a RESTful interface via HTTP. So all it needs to do is be able to listen for incoming commands and reply to them.
mod_wsgi may not be the proper tool for this job, which is fine, I just don't know what is.
Does the daemon mode of mod_wsgi offer enough persistence in your case? Or if you want to run the main process separately from Apache, how about mod_fastcgi? Maybe running Apache as a reverse proxy could be an option too.
It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way.
That's what multiprocessing queues are for.
http://docs.python.org/library/multiprocessing.html
http://docs.python.org/library/multiprocessing.html#pipes-and-queues
You'll be even happier if you start using Celery.
Celery will allow you to "remove Twisted because management has decided they don't like it."
However. Switch to celery means that things like "So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever" all have to be completely rethought. Instead of some main polling loop, you now have multiple, independent processes that are coordinated by celery.
What you'll find is all the housekeeping in your application -- all the coordination among threads -- the callbacks -- all that -- will go away. You'll be left with a few Python scripts that do the "real work" and Celery to manage the distributed task queue.
The desktop app should start the web server on launch and should shut it down on close.
Assuming that the desktop is the only client allowed to connect to the web server, what is the best way to write this?
Both the web server and the desktop run in a blocking loop of their own. So, should I be using threads or multiprocessing?
Use something like CherryPy or paste.httpserver. You can use wsgiref's server, and it generally works okay locally, but if you are doing Ajax the single-threaded nature of wsgiref can cause some odd results, or if you ever do a subrequest you'll get a race condition. But for most cases it'll be fine. It might be useful to you not to have an embedded threaded server (both CherryPy and paste.httpserver are threaded), in which case wsgiref would be helpful (all requests will run from the same thread).
Note that if you use CherryPy or paste.httpserver all requests will automatically happen in subthreads (those packages do the thread spawning for you), and you probably will not be able to directly touch the GUI code from your web code (since GUI code usually doesn't like to be handled by threads). For any of them the server code blocks, so you need to spawn a thread to start the server in. Twisted can run in your normal GUI event loop, but unless that's important it adds a lot of complexity.
Do not use BaseHTTPServer or SimpleHTTPServer, they are silly and complicated and in all cases where you might use then you should use wsgiref instead. Every single case, as wsgiref is has a sane API (WSGI) while these servers have silly APIs.
Have a look at the BaseHTTPServer package, or better yet the SimpleHTTPServer. Pretty simple and easy to use.
In Sauce RC, we use CherryPy. Since it's pure Python, it's very easy to embed it (as source on disk or in a zip file).
Or should I be using a totally different server?
Nginx with mod_wsgi requires the use of a non-blocking asynchronous framework and setup and isn't likely to work out of box with Pylons.
I usually go with the proxy route to a stand-alone Pylons process using the PasteScript#cherrypy WSGI server (as its higher performing than the Paste#http one, though it won't recycle threads if you have leaks...).
If you're set on using Apache and its your server (so you can compile and run Apache mod_wsgi), I'd suggest using that setup as its less maintenance to effectively utilize multiple cores. With a proxy setup, you'd have to use the mod_proxy_balancer with multiple paste processes to effectively utilize multiple cores/cpus.
If you're deploying to someone else's Apache (shared hosting), mod_proxy is generally the easier solution as its stock in Apache 2.2 and above.
Personally, I usually deploy with nginx + proxy to multiple paster processes.
I've also used mod_fastcgi + flup to great success several times now. There are a couple of recipes floating around for setting this up, but unfortunately it will probably still require some tweaking on your part to get everything working:
http://wiki.pylonshq.com/display/pylonscookbook/Production+Deployment+Using+Apache,+FastCGI+and+mod_rewrite
I have a Bluehost account where I can run Python scripts as CGI. I guess it's the simplest CGI, because to run I have to define the following in .htaccess:
Options +ExecCGI
AddType text/html py
AddHandler cgi-script .py
Now, whenever I look up web programming with Python, I hear a lot about WSGI and how most frameworks use it. But I just don't understand how it all fits together, especially when my web server is given (Apache running at a host's machine) and not something I can really play with (except defining .htaccess commands).
How are WSGI, CGI, and the frameworks all connected? What do I need to know, install, and do if I want to run a web framework (say web.py or CherryPy) on my basic CGI configuration? How to install WSGI support?
How WSGI, CGI, and the frameworks are all connected?
Apache listens on port 80. It gets an HTTP request. It parses the request to find a way to respond. Apache has a LOT of choices for responding. One way to respond is to use CGI to run a script. Another way to respond is to simply serve a file.
In the case of CGI, Apache prepares an environment and invokes the script through the CGI protocol. This is a standard Unix Fork/Exec situation -- the CGI subprocess inherits an OS environment including the socket and stdout. The CGI subprocess writes a response, which goes back to Apache; Apache sends this response to the browser.
CGI is primitive and annoying. Mostly because it forks a subprocess for every request, and subprocess must exit or close stdout and stderr to signify end of response.
WSGI is an interface that is based on the CGI design pattern. It is not necessarily CGI -- it does not have to fork a subprocess for each request. It can be CGI, but it doesn't have to be.
WSGI adds to the CGI design pattern in several important ways. It parses the HTTP Request Headers for you and adds these to the environment. It supplies any POST-oriented input as a file-like object in the environment. It also provides you a function that will formulate the response, saving you from a lot of formatting details.
What do I need to know / install / do if I want to run a web framework (say web.py or cherrypy) on my basic CGI configuration?
Recall that forking a subprocess is expensive. There are two ways to work around this.
Embedded mod_wsgi or mod_python embeds Python inside Apache; no process is forked. Apache runs the Django application directly.
Daemon mod_wsgi or mod_fastcgi allows Apache to interact with a separate daemon (or "long-running process"), using the WSGI protocol. You start your long-running Django process, then you configure Apache's mod_fastcgi to communicate with this process.
Note that mod_wsgi can work in either mode: embedded or daemon.
When you read up on mod_fastcgi, you'll see that Django uses flup to create a WSGI-compatible interface from the information provided by mod_fastcgi. The pipeline works like this.
Apache -> mod_fastcgi -> FLUP (via FastCGI protocol) -> Django (via WSGI protocol)
Django has several "django.core.handlers" for the various interfaces.
For mod_fastcgi, Django provides a manage.py runfcgi that integrates FLUP and the handler.
For mod_wsgi, there's a core handler for this.
How to install WSGI support?
Follow these instructions.
https://code.google.com/archive/p/modwsgi/wikis/IntegrationWithDjango.wiki
For background see this
http://docs.djangoproject.com/en/dev/howto/deployment/#howto-deployment-index
I think Florian's answer answers the part of your question about "what is WSGI", especially if you read the PEP.
As for the questions you pose towards the end:
WSGI, CGI, FastCGI etc. are all protocols for a web server to run code, and deliver the dynamic content that is produced. Compare this to static web serving, where a plain HTML file is basically delivered as is to the client.
CGI, FastCGI and SCGI are language agnostic. You can write CGI scripts in Perl, Python, C, bash, whatever. CGI defines which executable will be called, based on the URL, and how it will be called: the arguments and environment. It also defines how the return value should be passed back to the web server once your executable is finished. The variations are basically optimisations to be able to handle more requests, reduce latency and so on; the basic concept is the same.
WSGI is Python only. Rather than a language agnostic protocol, a standard function signature is defined:
def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']
That is a complete (if limited) WSGI application. A web server with WSGI support (such as Apache with mod_wsgi) can invoke this function whenever a request arrives.
The reason this is so great is that we can avoid the messy step of converting from a HTTP GET/POST to CGI to Python, and back again on the way out. It's a much more direct, clean and efficient linkage.
It also makes it much easier to have long-running frameworks running behind web servers, if all that needs to be done for a request is a function call. With plain CGI, you'd have to start your whole framework up for each individual request.
To have WSGI support, you'll need to have installed a WSGI module (like mod_wsgi), or use a web server with WSGI baked in (like CherryPy). If neither of those are possible, you could use the CGI-WSGI bridge given in the PEP.
You can run WSGI over CGI as Pep333 demonstrates as an example. However every time there is a request a new Python interpreter is started and the whole context (database connections, etc.) needs to be build which all take time.
The best if you want to run WSGI would be if your host would install mod_wsgi and made an appropriate configuration to defer control to an application of yours.
Flup is another way to run with WSGI for any webserver that can speak FCGI, SCGI or AJP. From my experience only FCGI really works, and it can be used in Apache either via mod_fastcgi or if you can run a separate Python daemon with mod_proxy_fcgi.
WSGI is a protocol much like CGI, which defines a set of rules how webserver and Python code can interact, it is defined as Pep333. It makes it possible that many different webservers can use many different frameworks and applications using the same application protocol. This is very beneficial and makes it so useful.
If you are unclear on all the terms in this space, and lets face it, its a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on: http://docs.python.org/howto/webservers.html
It's a simple abstraction layer for Python, akin to what the Servlet spec is for Java. Whereas CGI is really low level and just dumps stuff into the process environment and standard in/out, the above two specs model the http request and response as constructs in the language. My impression however is that in Python folks have not quite settled on de-facto implementations so you have a mix of reference implementations, and other utility-type libraries that provide other things along with WSGI support (e.g. Paste). Of course I could be wrong, I'm a newcomer to Python. The "web scripting" community is coming at the problem from a different direction (shared hosting, CGI legacy, privilege separation concerns) than Java folks had the luxury of starting with (running a single enterprise container in a dedicated environment against statically compiled and deployed code).