Python signals hosted on WSGI - python

I'm using the python signals library to kill a function if it runs longer than a set period of time.
It works well in my tests but when hosted on the server I get the following error
"signal only works in main thread"
I have set the WSGI signals restriction to be off in my httpd.conf
WSGIRestrictSignal Off
as described
http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Registration_Of_Signal_Handlers
I'm using the functions from the recipe described here
http://code.activestate.com/recipes/307871/
Not sure what I'm doing wrong. Is there a way to ensure that the signals are called in the main thread?

The only time any code under Apache/mod_wsgi runs as main thread is when a WSGI script file is being imported via WSGIImportScript or equivalent methods. Although one could use that method to register the signal handler from the main thread, it will be of no use as all subsequent requests are serviced via secondary threads and never the main thread. As a consequence, I don't think your signal handler will ever run as recollect that it can only run if the main thread is actually doing something, and that will not be the case as the main thread, will be blocked in Apache/mod_wsgi simply waiting for the process to be shutdown.
What is the operation doing that you are trying to kill? Doing it within context of web application, especially if web application is multi threaded probably isn't a good idea.

Related

Can i programmatically start a WSGI application server without blocking the main thread of execution?

I'm writing a web application in python, and using lettuce with splinter to write acceptance tests for it.
In order to do this, I need to get a wsgi server to start my application in the background, so that the application is available for my test suite. To do this, I've been spinning up a waitress server instance in another thread, for the browser being driven to connect to:
def setUp():
base = os.path.dirname(__file__) + "/../../.."
world.app = loadapp('config:test.ini', relative_to=base)
world.server_thread = thread.start_new_thread(serve_app, (world.app,))
def serve_app(app):
serve(app, host='0.0.0.0', port=7654)
This works quite nicely - The running application instance is available to my test suite in order to set up fixtures / mocks etc, and the server is available for my tests to connect to. However, when the tests finish, I've no way of sending a signal to the server to shut itself down cleanly. Therefore, is there anyway I can somehow either send a KeyboardInterrupt to the server thread I spawn (which will be caught by waitress and will prompt it to shut down cleanly), or failing that, is there an existing wsgi server implementation which exposes non-blocking start() and stop() methods that I could employ instead of waitress?
Mark the thread you run the WSGI server in as a daemon thread. That way when you exit the main thread it will not wait for the thread the server is running in to complete and will just exit the application.
You don't necessarily need WSGI server to unit-test a WSGI app, example.
For you acceptance tests you could start the server in a separate process and send SIGINT when you're done.
If you'd like to stick with your current setup; you could run the server in the main thread and your tests in background threads. Then you could send KeyboardInterrupt to the main thread from subthreads.
Is the problem just that the process won't shut down?
It's been a while since I've used Python for web development, so I've no experience of these libraries, but if shutting down is the problem it looks like a good scenario to use daemon threads.
Mentioned here:
http://docs.python.org/library/threading.html

Python:Django: Signal handler and main thread

I am building a django application which depends on a python module where a SIGINT signal handler has been implemented.
Assuming I cannot change the module I am dependent from, how can I workaround the "signal only works in main thread" error I get integrating it in Django ?
Can I run it on the Django main thread?
Is there a way to inhibit the handler to allow the module to run on non-main threads ?
Thanks!
Django's built-in development server has auto-reload feature enabled by default which spawns a new thread as a means of reloading code. To work around this you can simply do the following, although you'd obviously lose the convenience of auto-reloading:
python manage.py runserver --noreload
You'll also need to be mindful of this when choosing your production setup. At least some of the deployment options (such as threaded fastcgi) are certain to execute your code outside main thread.
I use Python 3.5 and Django 1.8.5 with my project, and I met a similar problem recently. I can easily run my xxx.py code with SIGNAL directly, but it can't be executed on Django as a package just because of the error "signal only works in main thread".
Firstly, runserver with --noreload --nothreading is usable but it runs my multi-thread code too slow for me.
Secondly, I found that code in __init__.py of my package ran in the main thread. But, of course, only the main thread can catch this signal, my code in package can't catch it at all. It can't solve my problem, although, it may be a solution for you.
Finally, I found that there is a built-in module named subprocess in Python. It means you can run a sub real complete process with it, that is to say, this process has its own main thread, so you can run your code with SIGNAL easily here. Though I don't know the performance with using it, it works well for me. PS, you can find all details about subprocess in Python Documentation.
Thank you~
There is a cleaner way, that doesn't break your ability to use threads and processes.
Put your registration calls in manage.py:
def handleKill(signum, frame):
print "Killing Thread."
# Or whatever code you want here
ForceTerminate.FORCE_TERMINATE = True
print threading.active_count()
exit(0)
if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")
from django.core.management import execute_from_command_line
signal.signal(signal.SIGINT, handleKill)
signal.signal(signal.SIGTERM, handleKill)
execute_from_command_line(sys.argv)
Although the question does not describe exactly the situation you are in, here is some more generic advice:
The signal is only sent to the main thread. For this reason, the signal handler should be in the main thread.
From that point on, the action that the signal triggers, needs to be communicated to the other threads. I usually do this using Events. The signal handler sets the event, which the other threads will read, and then realize that action X has been triggered. Obviously this implies that the event attribute should be shared among the threads.

Apache + mod_wsgi interaction

Before posting this, I have read quite a few resources online, including the mod_wsgi wiki, but I am confused about how exactly Apache processes/threads interact with mod_wsgi.
This is my current understanding: Apache can be configured to run such that one or more child processes can handle incoming requests, and each of these child processes can be configured to in turn use one or more threads to service requests. After that, things start getting hazy for me. My doubts are:
What is a WSGIDaemonProcess, and who actually calls my Django app using the python sub interpreter?
If I have my Django app running under a mode where multiple threads are allowed in a single Apache child process - does that mean that multiple requests could be simultaneously accessing my app at the same time? If so - would doing something like setting a module level variable (say that of an user's ID) could be over-written by other parallel requests and lead to non-thread safe behavior?
For the case above, with Python's global interpreter lock, would the threads actually be executing in parallel?
Answers to each of the points.
1 - WSGIDaemonProcess/WSGIProcessGroup indicate that mod_wsgi should fork of a separate process for running the WSGI application in. This is a fork only and not a fork/exec, so mod_wsgi is still in control of it. When it is detected that a URL maps to a WSGI application running in daemon mode, then mod_wsgi code in the Apache child worker processes will proxy the request details through to the daemon mode process where the mod_wsgi code there reads it and calls up into your WSGI application.
2 - Yes, multiple requests can be operating concurrently and be wanting to modify the module global data at the same time.
3 - For the time that execution is within Python itself then no, they aren't strictly running in parallel as the global interpreter lock means that only one thread can be executing Python code at a time. The Python interpreter will periodically switch which thread is getting to run. If one of the threads calls into C code and releases the GIL then at least for the time that thread is in that state it can run in parallel to other threads, running in Python or C code. As example, when calls are made down into the Apache/mod_wsgi layer to write back response data, the GIL is released. This means that the actual writing back of response data at the lower layers is not prevent other threads from running.

Is it ok to spawn threads in a wsgi-application?

To achieve something similar to google app engines 'deferred calls' (i.e., the request is handled, and afterwards the deferred task is handled), i experimented a little and came up with the solution to spawn a thread in which my deferred call is handled.
I am now trying to determine if this is an acceptable way.
Is it possible (according to the WSGI specification) that the process is terminated by the webserver after the actual request is handled, but before all threads run out?
(if there's a better way, that would be also fine)
WSGI does not specify the lifetime of an application process (as WSGI application is a Python callable object). You can run it in a way that is completely independent of the web server, in which case, only you control the lifetime.
There is also nothing in the WSGI that would prohibit you from spawning threads, or processes, or doing whatever the hell you want.
FWIW, also have a read of:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
The hooking of actions to close() of iterable is the only way within context of the WSGI specification itself for doing deferred work. That isn't in a separate thread though and would occur within the context of the actual request, albeit after the response is supposed to have been flushed back to the client. Thus your deferred action will consume that request thread until the work is complete and so that request thread would not be able to handle other requests until then.
In general, if you do use background threads, there is no guarantee that any hosting mechanism would wait until those background threads complete before shutting process down. In fact, can't even think of any standard deployment mechanism which does wait. There isn't really even a guarantee that atexit handlers will be called on process shutdown, something that the referenced documentation also briefly talks about.

Twisted network client with multiprocessing workers?

So, I've got an application that uses Twisted + Stomper as a STOMP client which farms out work to a multiprocessing.Pool of workers.
This appears to work ok when I just use a python script to fire this up, which (simplified) looks something like this:
# stompclient.py
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
reactor.connectTCP(host, port, StompClientFactory())
reactor.run()
As this gets packaged for deployment, I thought I would take advantage of the twistd script and run this from a tac file.
Here's my very-similar-looking tac file:
# stompclient.tac
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
application = service.Application('myapp')
service = internet.TCPClient(host, port, StompClientFactory())
service.setServiceParent(application)
For the sake of illustration, I have collapsed or changed a few details; hopefully they were not the essence of the problem. For example, my app has a plugin system, the pool is initialized by a separate method, and then work is delegated to the pool using pool.apply_async() passing one of my plugin's process() methods.
So, if I run the script (stompclient.py), everything works as expected.
It also appears to work OK if I run twist in non-daemon mode (-n):
twistd -noy stompclient.tac
however, it does not work when I run in daemon mode:
twistd -oy stompclient.tac
The application appears to start up OK, but when it attempts to fork off work, it just hangs. By "hangs", I mean that it appears that the child process is never asked to do anything and the parent (that called pool.apply_async()) just sits there waiting for the response to return.
I'm sure that I'm doing something stupid with Twisted + multiprocessing, but I'm really hoping that someone can explain to my the flaw in my approach.
Thanks in advance!
Since the difference between your working invocation and your non-working invocation is only the "-n" option, it seems most likely that the problem is caused by the daemonization process (which "-n" prevents from happening).
On POSIX, one of the steps involved in daemonization is forking and having the parent exit. Among of things, this has the consequence of having your code run in a different process than the one in which the .tac file was evaluated. This also re-arranges the child/parent relationship of processes which were started in the .tac file - as your pool of multiprocessing processes were.
The multiprocessing pool's processes start off with a parent of the twistd process you start. However, when that process exits as part of daemonization, their parent becomes the system init process. This may cause some problems, although probably not the hanging problem you described. There are probably other similarly low-level implementation details which normally allow the multiprocessing module to work but which are disrupted by the daemonization process.
Fortunately, avoiding this strange interaction should be straightforward. Twisted's service APIs allow you to run code after daemonization has completed. If you use these APIs, then you can delay the initialization of the multiprocessing module's process pool until after daemonization and hopefully avoid the problem. Here's an example of what that might look like:
from twisted.application.service import Service
class MultiprocessingService(Service):
def startService(self):
self.pool = multiprocessing.Pool(processes=processes)
MultiprocessingService().setServiceParent(application)
Now, separately, you may also run into problems relating to clean up of the multiprocessing module's child processes, or possibly issues with processes created with Twisted's process creation API, reactor.spawnProcess. This is because part of dealing with child processes correctly generally involves handling the SIGCHLD signal. Twisted and multiprocessing aren't going to be cooperating in this regard, though, so one of them is going to get notified of all children exiting and the other will never be notified. If you don't use Twisted's API for creating child processes at all, then this may be okay for you - but you might want to check to make sure any signal handler the multiprocessing module tries to install actually "wins" and doesn't get replaced by Twisted's own handler.
A possible idea for you...
When running in daemon mode twistd will close stdin, stdout and stderr. Does something that your clients do read or write to these?

Categories

Resources