How Python web frameworks, WSGI and CGI fit together

How Python web frameworks, WSGI and CGI fit together - python

I have a Bluehost account where I can run Python scripts as CGI. I guess it's the simplest CGI, because to run I have to define the following in .htaccess:
Options +ExecCGI
AddType text/html py
AddHandler cgi-script .py
Now, whenever I look up web programming with Python, I hear a lot about WSGI and how most frameworks use it. But I just don't understand how it all fits together, especially when my web server is given (Apache running at a host's machine) and not something I can really play with (except defining .htaccess commands).
How are WSGI, CGI, and the frameworks all connected? What do I need to know, install, and do if I want to run a web framework (say web.py or CherryPy) on my basic CGI configuration? How to install WSGI support?

How WSGI, CGI, and the frameworks are all connected?
Apache listens on port 80. It gets an HTTP request. It parses the request to find a way to respond. Apache has a LOT of choices for responding. One way to respond is to use CGI to run a script. Another way to respond is to simply serve a file.
In the case of CGI, Apache prepares an environment and invokes the script through the CGI protocol. This is a standard Unix Fork/Exec situation -- the CGI subprocess inherits an OS environment including the socket and stdout. The CGI subprocess writes a response, which goes back to Apache; Apache sends this response to the browser.
CGI is primitive and annoying. Mostly because it forks a subprocess for every request, and subprocess must exit or close stdout and stderr to signify end of response.
WSGI is an interface that is based on the CGI design pattern. It is not necessarily CGI -- it does not have to fork a subprocess for each request. It can be CGI, but it doesn't have to be.
WSGI adds to the CGI design pattern in several important ways. It parses the HTTP Request Headers for you and adds these to the environment. It supplies any POST-oriented input as a file-like object in the environment. It also provides you a function that will formulate the response, saving you from a lot of formatting details.
What do I need to know / install / do if I want to run a web framework (say web.py or cherrypy) on my basic CGI configuration?
Recall that forking a subprocess is expensive. There are two ways to work around this.
Embedded mod_wsgi or mod_python embeds Python inside Apache; no process is forked. Apache runs the Django application directly.
Daemon mod_wsgi or mod_fastcgi allows Apache to interact with a separate daemon (or "long-running process"), using the WSGI protocol. You start your long-running Django process, then you configure Apache's mod_fastcgi to communicate with this process.
Note that mod_wsgi can work in either mode: embedded or daemon.
When you read up on mod_fastcgi, you'll see that Django uses flup to create a WSGI-compatible interface from the information provided by mod_fastcgi. The pipeline works like this.
Apache -> mod_fastcgi -> FLUP (via FastCGI protocol) -> Django (via WSGI protocol)
Django has several "django.core.handlers" for the various interfaces.
For mod_fastcgi, Django provides a manage.py runfcgi that integrates FLUP and the handler.
For mod_wsgi, there's a core handler for this.
How to install WSGI support?
Follow these instructions.
https://code.google.com/archive/p/modwsgi/wikis/IntegrationWithDjango.wiki
For background see this
http://docs.djangoproject.com/en/dev/howto/deployment/#howto-deployment-index

I think Florian's answer answers the part of your question about "what is WSGI", especially if you read the PEP.
As for the questions you pose towards the end:
WSGI, CGI, FastCGI etc. are all protocols for a web server to run code, and deliver the dynamic content that is produced. Compare this to static web serving, where a plain HTML file is basically delivered as is to the client.
CGI, FastCGI and SCGI are language agnostic. You can write CGI scripts in Perl, Python, C, bash, whatever. CGI defines which executable will be called, based on the URL, and how it will be called: the arguments and environment. It also defines how the return value should be passed back to the web server once your executable is finished. The variations are basically optimisations to be able to handle more requests, reduce latency and so on; the basic concept is the same.
WSGI is Python only. Rather than a language agnostic protocol, a standard function signature is defined:
def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']
That is a complete (if limited) WSGI application. A web server with WSGI support (such as Apache with mod_wsgi) can invoke this function whenever a request arrives.
The reason this is so great is that we can avoid the messy step of converting from a HTTP GET/POST to CGI to Python, and back again on the way out. It's a much more direct, clean and efficient linkage.
It also makes it much easier to have long-running frameworks running behind web servers, if all that needs to be done for a request is a function call. With plain CGI, you'd have to start your whole framework up for each individual request.
To have WSGI support, you'll need to have installed a WSGI module (like mod_wsgi), or use a web server with WSGI baked in (like CherryPy). If neither of those are possible, you could use the CGI-WSGI bridge given in the PEP.

You can run WSGI over CGI as Pep333 demonstrates as an example. However every time there is a request a new Python interpreter is started and the whole context (database connections, etc.) needs to be build which all take time.
The best if you want to run WSGI would be if your host would install mod_wsgi and made an appropriate configuration to defer control to an application of yours.
Flup is another way to run with WSGI for any webserver that can speak FCGI, SCGI or AJP. From my experience only FCGI really works, and it can be used in Apache either via mod_fastcgi or if you can run a separate Python daemon with mod_proxy_fcgi.
WSGI is a protocol much like CGI, which defines a set of rules how webserver and Python code can interact, it is defined as Pep333. It makes it possible that many different webservers can use many different frameworks and applications using the same application protocol. This is very beneficial and makes it so useful.

If you are unclear on all the terms in this space, and lets face it, its a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on: http://docs.python.org/howto/webservers.html

It's a simple abstraction layer for Python, akin to what the Servlet spec is for Java. Whereas CGI is really low level and just dumps stuff into the process environment and standard in/out, the above two specs model the http request and response as constructs in the language. My impression however is that in Python folks have not quite settled on de-facto implementations so you have a mix of reference implementations, and other utility-type libraries that provide other things along with WSGI support (e.g. Paste). Of course I could be wrong, I'm a newcomer to Python. The "web scripting" community is coming at the problem from a different direction (shared hosting, CGI legacy, privilege separation concerns) than Java folks had the luxury of starting with (running a single enterprise container in a dedicated environment against statically compiled and deployed code).

Related

Could you explain more detailed differences between mod_wsgi and werkzeug? (SOS newbies)

As I stated on the title, I'm currently feeling pretty uncomfortable of basic understanding of them.
As far as I know, mod_wsgi implemented WSGI specification which can be run under Apache web server.
It was coded in C language.
Another one, werkzeug is a kind of toolkit which have useful utilities.
I also reviewed werkzeug can run simple service which is implemented within its sources(make_server in serving.py). I aware that werkzeug has useful features and simple server feature.
What I want to know it the below.
When using Flask like framework based on werkzeug under Apache web server, what does mod_wsgi do exactly?
werkzeug has also basic http server functionality which is don't need to be supported mod_wsgi.
Can anyone explain the differences between mod_wsgi and werkzeug ?
mod_wsgi and werkzeug has duplicated features from the perspective of web server.

WSGI stands for Web Server Gateway Interface, (mostly) defined by PEP 333 at http://www.python.org/dev/peps/pep-0333/ .
It is an effort by the Python community to establish a standard mechanism for web servers to speak to Python applications.
In theory, any wsgi compliant server (or extension to an existing web server) should be able to load and run any wsgi compliant application.
werkzeug is a web application framework which can run under a compliant WSGI server, such as Apache+mod_wsgi. It also contains a built-in development server that you can use for development.
WSGI can be very confusing at first, but it is actually pretty simple. The WSGI spec requires that your python application do the following:
define a callable named application
said callable should accept 2 parametes: (environ, start_response)
environ is a dictionary of environment variables
start_response is a callable that needs called to start the response
Once application is called, then it handles the request, builds the output, and:
calls start_response('200 OK', Headers)
return [content]
A simple WSGI app might look like this:
def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
It is strongly suggested that you use an existing WSGI framework, as there are a lot of details involved in parsing HTTP requests, handling file uploads, encoding characters, etc...
Take a look at Bottle, Flask, werkzeug, AppStruct, etc...

mod_wsgi is a wsgi compliant python module that bridges python and apache. it lets you run applications coded to the wsgi spec under apache.
werkzeug is a wsgi utility library, used to build wsgi compliant applications. it ships with a development server.
there are a handful of Python Web Application Frameworks: Pyramid/Pylons, Flask, Bottle, Django, CherryPy, etc etc. They all implement the WSGI spec, which is the de-facto standard for building web applications in Python ( http://en.wikipedia.org/wiki/Web_Server_Gateway_Interface )
Most web application frameworks ship with a debug-only or production-capable web server. When you have a WSGI app, you can serve via the library's app, through Apache via mod_wsgi, or using a 'pure' wsgi server like uWSGI , gunicorn, fapws, or twisted.
Most people I know will deploy a wsgi app like this:
a lightweight server , like nginx, listens on port80
the lightweight server serves static files itself
the lightweight server proxies uWSGI requests to another server, which is often uWSGI, but sometimes apache+mod_wsgi or another. depending on the setup, the proxying can either be an http proxy or connecting to the uWSGI server directly or through a socket.
with that being said, to specifically answer your question, read the first paragraph of this docs page - http://werkzeug.pocoo.org/docs/serving/ :
There are many ways to serve a WSGI application. While you’re developing it, you usually don’t want to have a full-blown webserver like Apache up and running, but instead a simple standalone one. Because of that Werkzeug comes with a builtin development server.
For development reasons, or on a very low traffic site, you can just use the Werkzeug server. If you're deploying an application that will get a reasonable amount of traffic, you'll want something more robust.
mod_wsgi or uWSGI duplicate the serving features of Werkzeug , but they do it because they can do it considerably better - faster response times, lower memory , better concurrency, more stable, etc etc etc. the Werkzeug server is "good enough" for many uses, but its not "the best way" to serve a wsgi compliant app.

How to run nginx + python (without django)

I want to have simple program in python that can process different requests (POST, GET, MULTIPART-FORMDATA). I don't want to use a complete framework.
I basically need to be able to get GET and POST params - probably (but not necessarily) in a way similar to PHP. To get some other SERVER variables like REQUEST_URI, QUERY, etc.
I have installed nginx successfully, but I've failed to find a good example on how to do the rest. So a simple tutorial or any directions and ideas on how to setup nginx to run certain python process for certain virtual host would be most welcome!

Although you can make Python run a webserver by itself with wsgiref, I would recommend using one of the many Python webservers and/or web frameworks around. For pure and simple Python webhosting we have several options available:
gunicorn
tornado
twisted
uwsgi
cherrypy
If you're looking for more features you can look at some web frameworks:
werkzeug
flask
masonite
cherrypy (yes, cherrypy is both a webserver and a framework)
django (for completeness, I know that was not the purpose of the question)

You should look into using Flask -- it's an extremely lightweight interface to a WSGI server (werkzeug) which also includes a templating library, should you ever want to use one. But you can totally ignore it if you'd like.

You can use thttpd. It is a lightweight wsgi server for running cgi scripts. It works well with nginx. How to setup thttpd with Nginx is detailed here: http://nginxlibrary.com/running-cgi-scripts-using-thttpd/

All the same you must use wsgi server, as nginx does not support fully this protocol.

Python/C Raw Socket Operations using Django, Mod_WSGI, Apache

I'm currently writing a web application using Django, Apache, and mod_wsgi that provides some FreeBSD server management and configuration features, including common firewall operations.
My Python/C library uses raw sockets to interact directly with the firewall and works perfectly fine when running as root, but raw socket operations are only allowed for root.
At this point, the only thing I can think of is to install and use sudo to explicitly allow the www user access to /sbin/ipfw which isn't ideal since I would prefer to use my raw socket library operations rather than a subprocess call.
I suppose another option would be to write (local domain sockets) or use an existing job system (Celery?) that runs as root and handles these requests.
Or perhaps there's some WSGI Daemon mode trickery I'm unaware of? I'm sure this issue has been encountered before. Any advice on the best way to handle this?

Use Celery or some other back end service which runs as root. Having a web application process run as root is a security problem waiting to happen. This is why mod_wsgi blocks you running daemon processes as root. Sure you could hack the code to disable the exclusion, but I am not about to tell you how to do that.

Differences and uses between WSGI, CGI, FastCGI, and mod_python in regards to Python?

I'm just wondering what the differences and advantages are for the different CGI's out there. Which one would be best for python scripts, and how would I tell the script what to use?

A part answer to your question, including scgi.
What's the difference between scgi and wsgi?
Is there a speed difference between WSGI and FCGI?
How Python web frameworks, WSGI and CGI fit together
CGI vs FCGI
Lazy and not writing it on my own. From the wikipedia: http://en.wikipedia.org/wiki/FastCGI
Instead of creating a new process for each request, FastCGI uses persistent processes to handle such requests. Multiple processes can configured, increasing stability and scalability. Each individual FastCGI process can handle many requests over its lifetime, thereby avoiding the overhead of per-request process creation and termination

There's also a good background reader on CGI, WSGI and other options, in the form of an official python HOWTO: http://docs.python.org/2/howto/webservers.html

In a project like Django, you can use a WSGI (Web Server Gateway Interface) server from the Flup module.
A WSGI server wraps a back-end process using one or more protocols:
FastCGI (calling a server process)
SCGI (Simple Common Gateway Interface - a simpler FastCGI)
AJP (Apache JServ Protocol - a Java FastCGI)
mod_python (runs pre-loaded code per request - uses lots of RAM)
CGI (Common Gateway Interface, starts a process per request - slow)
In 2019, WSGI was superseded by ASGI (Asynchronous Server Gateway Interface), used by frameworks like FastAPI on servers like Uvicorn, which is much faster.

FastCGI is a kind of CGI which is long-live, which will always be running.
With FastCGI, it'll take less time.
Because of multi-processes, FastCGI will cost more memory than CGI.
In Detail Diff between FastCGI vs CGI

Minimal, Standalone, Distributable, cross platform web server

I've been writing a fair number of smaller wsgi apps lately and am looking to find a web server that can be distributed, preconfigured to run the specific app. I know there are things like twisted and cherrypy which can serve up wsgi apps, but they seem to be missing a key piece of functionality for me, which is the ability to "pseudostream" large files using the http range header. Is there a web server available under a BSD or similar license which can be distributed as a standalone executable on any of the major platforms which is capable of both proxying to a a wsgi server (like cherrypy or the like) AND serving large files using the http range header?

Lighttpd has a BSD license, so you should be able to bundle it if you wanted.
You say its for small apps, so I guess that means, small, local, single user web interfaces being served by a small http server? If thats is the case, then any python implementation should work. Just use something like py2exe to package it (in fact, there was a question relating to packaging python programs here on SO not too long ago).
Update, re: range header:
The default python http server may not support the range header you want, but its pretty easy to write your own handler, or a small wsgi app to do the logic, especially if all you're doing is streaming a file. It wouldn't be too many lines:
def stream_file(environ, start_response):
fp = open(base_dir + environ["PATH_INFO"])
fp.seek(environ["HTTP_CONTENT_RANGE"]) # just an example
start_response("200 OK", (('Content-Type', "file/type")))
return fp

What's wrong with Apache + mod_wsgi? Apache is already multiplatform; it's often already installed (except in Windows).
You might also want to look at lighttpd, there are some blogs on configuring it to work with WSGI. See http://cleverdevil.org/computing/24/python-fastcgi-wsgi-and-lighttpd, and http://redmine.lighttpd.net/issues/show/1523

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.