What options do I have for the deployment of a CPU bound Python-WSGI application on Windows?
The application benefits greatly from multiple CPUs (image manipulation/encoding) but the GIL prevents it from using them.
My understanding is:
mod_wsgi has no support for WSGIDaemonProcess on Windows and Apache itself only runs with one process
all fork based solutions (flup, spawning, gunicorn) only work on unix
Are there any other deployment options I'm missing?
PS: I asked that on serverfault but someone suggested to ask here.
I have successfully used isapi-wsgi to deploy WSGI web apps on Windows IIS (I assume that since you're deploying on Windows, IIS is an option).
Create an IIS Application Pool to host your application in and configure it as a Web Garden (Properties | Performance | Maximum number of worker processes).
Disclaimer: I have never used this feature myself (I have always used the default App Pool configuration, where Max. number of worker processes is 1). But it is my understanding that this will spin up more processes to handle requests.
It would be a bit of a mess, but you could use the subprocess module to fire off worker processes yourself. I'm pretty sure that Popen.wait() and/or Popen.communicate() ought to release the GIL. You've still got the process creation overhead though, so you might not gain a lot/anything over standard CGI.
Another option is to have separate server/worker processes running the whole time and use some form of IPC, although this isn't going to be an easy option. Have a look at the multiprocessing module, and potentially also Pyro.
Related
I need to run some tasks in background of web app (checking the code out, etc) without blocking the views.
The twist in typical Queue/Celery scenario is that I have to ensure that the tasks will complete, surviving even web app crash or restart until those tasks complete, whatever their final result.
I was thinking about recording parameters for multiprocessing.Pool in a database and starting all the incomplete tasks at webapp restart. It's doable, but I'm wondering if there's a simpler or more cost-effective aproach?
UPDATE: Why not Celery itself? Well, I used Celery in some projects and it's really a great solution, but for this task it's on the big side: it requires a separate server, communication, etc., while all I need is spawning a few processes/threads, doing some work in them (git clone ..., svn co ...) and checking whether they succeeded or failed. Another issue is that I need the solution to be as small as possible since I have to make it follow elaborate corporate guidelines, procedures, etc., and the human administrative and bureaucratic overhead I'd have to go through to get Celery onboard is something I'd prefer to avoid if I can.
I would suggest you to use Celery.
Celery does not require its own server, you can have a worker running on the same machine. You can also have a "poor man's queue" using an SQL database instead of a "real" queue/messaging server such as RabbitMQ - this setup would look very much like what you're describing, only with a separate process doing the long-running tasks.
The problem with starting long-running tasks from the webserver process is that in the production environment the web "workers" are normally managed by the webserver - multiple workers can be spawned or killed at any time. The viability of your approach would highly depend on the web server you're using and its configuration. Also, with multiple workers each trying to do a task you may have some concurrency issues.
Apart from Celery, another option is to look at UWSGI's spooler subsystem, especially if you're already using UWSGI.
Ubuntu 11.10, Python 2.6. Background: I have an existing Python app that is using Twisted to sit in a loop and wait for RESTful commands to come in. So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever. When a request comes in, the appropriate handler is called, stuff happens, a reply is sent back.
My job is now to remove Twisted because management has decided they don't like it. We're moving to Apache as our web server.
Using the documentation, I have successfully installed and configured Apache2.0 to serve web pages. I also installed mod_wsgi, and was able to configure it and Apache to execute arbitrary Python code when a request comes in. So I'm good on that side.
What I'm missing is how to connect my Python application to the Apache/mod_wsgi bits, since the application needs to be persistent and always running. It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way. But it seems like this is something that should already be out there, I just don't know enough to know what to search for.
Any pushes in the right direction are greatly appreciated.
Further edit for clarity: I'm not making a webserver. The application in question is a host app that is running on a virtual machine. It happens to be controlled by a RESTful interface via HTTP. So all it needs to do is be able to listen for incoming commands and reply to them.
mod_wsgi may not be the proper tool for this job, which is fine, I just don't know what is.
Does the daemon mode of mod_wsgi offer enough persistence in your case? Or if you want to run the main process separately from Apache, how about mod_fastcgi? Maybe running Apache as a reverse proxy could be an option too.
It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way.
That's what multiprocessing queues are for.
http://docs.python.org/library/multiprocessing.html
http://docs.python.org/library/multiprocessing.html#pipes-and-queues
You'll be even happier if you start using Celery.
Celery will allow you to "remove Twisted because management has decided they don't like it."
However. Switch to celery means that things like "So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever" all have to be completely rethought. Instead of some main polling loop, you now have multiple, independent processes that are coordinated by celery.
What you'll find is all the housekeeping in your application -- all the coordination among threads -- the callbacks -- all that -- will go away. You'll be left with a few Python scripts that do the "real work" and Celery to manage the distributed task queue.
I'm currently writing a web application using Django, Apache, and mod_wsgi that provides some FreeBSD server management and configuration features, including common firewall operations.
My Python/C library uses raw sockets to interact directly with the firewall and works perfectly fine when running as root, but raw socket operations are only allowed for root.
At this point, the only thing I can think of is to install and use sudo to explicitly allow the www user access to /sbin/ipfw which isn't ideal since I would prefer to use my raw socket library operations rather than a subprocess call.
I suppose another option would be to write (local domain sockets) or use an existing job system (Celery?) that runs as root and handles these requests.
Or perhaps there's some WSGI Daemon mode trickery I'm unaware of? I'm sure this issue has been encountered before. Any advice on the best way to handle this?
Use Celery or some other back end service which runs as root. Having a web application process run as root is a security problem waiting to happen. This is why mod_wsgi blocks you running daemon processes as root. Sure you could hack the code to disable the exclusion, but I am not about to tell you how to do that.
In the tornado documentation they show how they can have a very large through-put from 4 frontends. I'd like to run an app in the same way, and would like to have the frontends running as daemon processes managed with an init.d script*.
I'm fairly new to Python so don't really know where to start. Currently I'm starting the Tornado server manually in the terminal, passing in a new port number each time.
I've tried using the python-daemon package in conjunction with the lockfile package but the lockfiles that are created don't have the process ids in them and I can't see how to then kill the processes gracefully later on.
I don't really know where to go from here, and the Tornado docs leave a large chunk out regarding deployment.
* If there's a better way to manage the processes so that they can be monitored and managed as a group then please let me know.
Try Supervisor. It's great for managing multiple daemon processes. You configure your applications in the supervisord.conf file and supervisord itself is launched from an init.d script.
I can vouch for Supervisor too. We have been using tornado in production with 4 instances using supervisor and it is working uber smooth
Or should I be using a totally different server?
Nginx with mod_wsgi requires the use of a non-blocking asynchronous framework and setup and isn't likely to work out of box with Pylons.
I usually go with the proxy route to a stand-alone Pylons process using the PasteScript#cherrypy WSGI server (as its higher performing than the Paste#http one, though it won't recycle threads if you have leaks...).
If you're set on using Apache and its your server (so you can compile and run Apache mod_wsgi), I'd suggest using that setup as its less maintenance to effectively utilize multiple cores. With a proxy setup, you'd have to use the mod_proxy_balancer with multiple paste processes to effectively utilize multiple cores/cpus.
If you're deploying to someone else's Apache (shared hosting), mod_proxy is generally the easier solution as its stock in Apache 2.2 and above.
Personally, I usually deploy with nginx + proxy to multiple paster processes.
I've also used mod_fastcgi + flup to great success several times now. There are a couple of recipes floating around for setting this up, but unfortunately it will probably still require some tweaking on your part to get everything working:
http://wiki.pylonshq.com/display/pylonscookbook/Production+Deployment+Using+Apache,+FastCGI+and+mod_rewrite